Artificial Intelligence and Machine Learning

bank_churn_header_sm2.png

Bank Churn Prediction

Problem Statement¶

Context¶

Businesses like banks which provide service have to worry about problem of 'Customer Churn' i.e. customers leaving and joining another service provider. It is important to understand which aspects of the service influence a customer's decision in this regard. Management can concentrate efforts on improvement of service, keeping in mind these priorities.

Objective¶

You as a Data scientist with the bank need to build a neural network based classifier that can determine whether a customer will leave the bank or not in the next 6 months.

Data Dictionary¶

  • CustomerId: Unique ID which is assigned to each customer

  • Surname: Last name of the customer

  • CreditScore: It defines the credit history of the customer.

  • Geography: A customer’s location

  • Gender: It defines the Gender of the customer

  • Age: Age of the customer

  • Tenure: Number of years for which the customer has been with the bank

  • NumOfProducts: refers to the number of products that a customer has purchased through the bank.

  • Balance: Account balance

  • HasCrCard: It is a categorical variable which decides whether the customer has credit card or not.

  • EstimatedSalary: Estimated salary

  • isActiveMember: Is is a categorical variable which decides whether the customer is active member of the bank or not ( Active member in the sense, using bank products regularly, making transactions etc )

  • Exited : whether or not the customer left the bank within six month. It can take two values ** 0=No ( Customer did not leave the bank ) ** 1=Yes ( Customer left the bank )

Importing necessary libraries¶

In [1]:
# Installing the libraries with the specified version.
!pip install tensorflow==2.15.0 scikit-learn==1.2.2 seaborn==0.13.1 matplotlib==3.7.1 numpy==1.25.2 pandas==2.0.3 imbalanced-learn==0.10.1 -q --user

Note: After running the above cell, please restart the notebook kernel/runtime (depending on whether you're using Jupyter Notebook or Google Colab) and then sequentially run all cells from the one below.

In [200]:
# Libraries to help with reading and manipulating data
import pandas as pd
import numpy as np

# libaries to help with data visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Library to split data
from sklearn.model_selection import train_test_split

# library to import to standardize the data
from sklearn.preprocessing import StandardScaler, LabelEncoder

# importing different functions to build models
import tensorflow as tf
from tensorflow import keras
from keras import backend
from keras.models import Sequential
from keras.layers import Dense, Dropout

# importing SMOTE
from imblearn.over_sampling import SMOTE

# importing metrics
from sklearn.metrics import confusion_matrix,roc_curve,classification_report,recall_score
from sklearn.metrics import precision_recall_curve

import random

import time

from sklearn.metrics import roc_curve, auc

# Library to avoid the warnings
import warnings
warnings.filterwarnings("ignore")
In [3]:
# Set the seed using keras.utils.set_random_seed. This will set:
# 1) `numpy` seed
# 2) backend random seed
# 3) `python` random seed
tf.keras.utils.set_random_seed(812)

# If using TensorFlow, this will make GPU ops as deterministic as possible,
# but it will affect the overall performance, so be mindful of that.
tf.config.experimental.enable_op_determinism()

Loading the dataset¶

In [214]:
# Loading dataset from my Google Drive same way I did on my previous Projects
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

Reading the dataset¶

In [216]:
# Inputting the file path from my Google Drive to where the foodhub_order.csv data set is located

df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/mod4_wk3_bank-1.csv')
In [6]:
# Making a copy of the dataframe to avoid making any changes to the original dataset.
data = df.copy()

Understanding the structure of the data¶

In [7]:
# let's view the first 5 rows of the data
data.head()
Out[7]:
RowNumber CustomerId Surname CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited
0 1 15634602 Hargrave 619 France Female 42 2 0.00 1 1 1 101348.88 1
1 2 15647311 Hill 608 Spain Female 41 1 83807.86 1 0 1 112542.58 0
2 3 15619304 Onio 502 France Female 42 8 159660.80 3 1 0 113931.57 1
3 4 15701354 Boni 699 France Female 39 1 0.00 2 0 0 93826.63 0
4 5 15737888 Mitchell 850 Spain Female 43 2 125510.82 1 1 1 79084.10 0
  • Observations:
  • From a quick view of the 1st 5 rows of data I can see that we can probably remove some columns during data preparation like RowNumber, Surname and CustomerID which will not be useful to our model building.
  • I do see some zero account balances. I may decide to leave those, undecided until I dig a little deeper into the data.
In [8]:
# let's view the last 5 rows of the data
data.tail()
Out[8]:
RowNumber CustomerId Surname CreditScore Geography Gender Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited
9995 9996 15606229 Obijiaku 771 France Male 39 5 0.00 2 1 0 96270.64 0
9996 9997 15569892 Johnstone 516 France Male 35 10 57369.61 1 1 1 101699.77 0
9997 9998 15584532 Liu 709 France Female 36 7 0.00 1 0 1 42085.58 1
9998 9999 15682355 Sabbatini 772 Germany Male 42 3 75075.31 2 1 0 92888.52 1
9999 10000 15628319 Walker 792 France Female 28 4 130142.79 1 1 0 38190.78 0
  • Observations:
  • From a quick view of the last 5 rows of data it looks like there are about 10,000 rows and from above, there are 14 columns.
  • I see more zero account balances as well.

Understand the shape of the dataset¶

In [9]:
# Checking the number of rows and columns in the data
data.shape
Out[9]:
(10000, 14)
  • Observations:

As confirmed above, there are 10000 rows and 14 columns as indicated from the data.shape command above.

Check the data types of the columns for the dataset¶

In [10]:
# data.info will give me the range of data types across the columns
# including column names, non-null counts and data types of the columns.

# data.dtypes will list each column's data type.
# data.info() gives more details, including non-null counts and memory usage

data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   RowNumber        10000 non-null  int64  
 1   CustomerId       10000 non-null  int64  
 2   Surname          10000 non-null  object 
 3   CreditScore      10000 non-null  int64  
 4   Geography        10000 non-null  object 
 5   Gender           10000 non-null  object 
 6   Age              10000 non-null  int64  
 7   Tenure           10000 non-null  int64  
 8   Balance          10000 non-null  float64
 9   NumOfProducts    10000 non-null  int64  
 10  HasCrCard        10000 non-null  int64  
 11  IsActiveMember   10000 non-null  int64  
 12  EstimatedSalary  10000 non-null  float64
 13  Exited           10000 non-null  int64  
dtypes: float64(2), int64(9), object(3)
memory usage: 1.1+ MB
  • Observations:
  1. There are 14 columns and 10000 total rows ranging from index 0 to 9999.
  2. There is no missing data as indicated by the full 10000 non-null values across the columns.
  3. There are 11 numeric columns and 3 text or object type.
  4. The associated memory usage of this data frame is 1.1+ MB. This may come down a bit once we begin data preparation for modeling.

Checking the Statistical Summary¶

In [11]:
data.describe().T
Out[11]:
count mean std min 25% 50% 75% max
RowNumber 10000.0 5.000500e+03 2886.895680 1.00 2500.75 5.000500e+03 7.500250e+03 10000.00
CustomerId 10000.0 1.569094e+07 71936.186123 15565701.00 15628528.25 1.569074e+07 1.575323e+07 15815690.00
CreditScore 10000.0 6.505288e+02 96.653299 350.00 584.00 6.520000e+02 7.180000e+02 850.00
Age 10000.0 3.892180e+01 10.487806 18.00 32.00 3.700000e+01 4.400000e+01 92.00
Tenure 10000.0 5.012800e+00 2.892174 0.00 3.00 5.000000e+00 7.000000e+00 10.00
Balance 10000.0 7.648589e+04 62397.405202 0.00 0.00 9.719854e+04 1.276442e+05 250898.09
NumOfProducts 10000.0 1.530200e+00 0.581654 1.00 1.00 1.000000e+00 2.000000e+00 4.00
HasCrCard 10000.0 7.055000e-01 0.455840 0.00 0.00 1.000000e+00 1.000000e+00 1.00
IsActiveMember 10000.0 5.151000e-01 0.499797 0.00 0.00 1.000000e+00 1.000000e+00 1.00
EstimatedSalary 10000.0 1.000902e+05 57510.492818 11.58 51002.11 1.001939e+05 1.493882e+05 199992.48
Exited 10000.0 2.037000e-01 0.402769 0.00 0.00 0.000000e+00 0.000000e+00 1.00
  • Observations:
  • Wow - the oldest person is 92.
  • In reviewing the numerical data, In don't see any items that directly stand out.

Checking for duplicates and null values¶

In [12]:
# Let's check for duplicate values in the data
data.duplicated().sum()
Out[12]:
0
In [13]:
# Let's check for missing values in the data
round(data.isnull().sum() / data.isnull().count() * 100, 2)
Out[13]:
0
RowNumber 0.0
CustomerId 0.0
Surname 0.0
CreditScore 0.0
Geography 0.0
Gender 0.0
Age 0.0
Tenure 0.0
Balance 0.0
NumOfProducts 0.0
HasCrCard 0.0
IsActiveMember 0.0
EstimatedSalary 0.0
Exited 0.0

  • Observations:
  • Based on the above review, there are no duplicated rows and there are no null values. All rows/columns are filled in.

Let's check the count of each unique category in each of the categorical variables¶

In [14]:
# Making a list of all categorical variables and assign to cat_col
cat_col = list(data.select_dtypes("object").columns)

# Printing number of count of each unique value in each column
for column in cat_col:
    print("Unique values in", column, "are :")
    print(data[column].value_counts())
    print("-" * 50)
Unique values in Surname are :
Surname
Smith        32
Martin       29
Scott        29
Walker       28
Brown        26
             ..
Wells         1
Calzada       1
Gresswell     1
Aguirre       1
Morales       1
Name: count, Length: 2932, dtype: int64
--------------------------------------------------
Unique values in Geography are :
Geography
France     5014
Germany    2509
Spain      2477
Name: count, dtype: int64
--------------------------------------------------
Unique values in Gender are :
Gender
Male      5457
Female    4543
Name: count, dtype: int64
--------------------------------------------------
  • Observations:

  • Based on the above, I will probably remove the surname column as it will not add value during model building.

  • Gender is fairly evenly split.

  • France has the greatest number of customers.

Let's check the number of unique values in each column¶

In [15]:
# Let's check the number of unique values in each column
data.nunique()
Out[15]:
0
RowNumber 10000
CustomerId 10000
Surname 2932
CreditScore 460
Geography 3
Gender 2
Age 70
Tenure 11
Balance 6382
NumOfProducts 4
HasCrCard 2
IsActiveMember 2
EstimatedSalary 9999
Exited 2

  • Observations:

  • Based on the above, I don't see any particular item that stands out in the unique values across the columns.

Checking percentages of our Target Variable¶

In [16]:
data["Exited"].value_counts(1)
Out[16]:
proportion
Exited
0 0.7963
1 0.2037

  • Observations:

  • Based on the above, it looks like ~80% of customers have not left the bank for another bank and ~20% have left.

Dropping columns Surname, RowNumber, and CustomerID¶

In [17]:
#RowNumber , CustomerId and Surname are unique hence dropping it
data = data.drop(['RowNumber', 'CustomerId', 'Surname'], axis=1)
In [18]:
# Let's verify there are only 11 colums now in our dataset
data.shape
Out[18]:
(10000, 11)
In [19]:
# Let's verify the 'RowNumber', 'CustomerId', and 'Surname' columns have in fact been dropped from our dataset.
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   CreditScore      10000 non-null  int64  
 1   Geography        10000 non-null  object 
 2   Gender           10000 non-null  object 
 3   Age              10000 non-null  int64  
 4   Tenure           10000 non-null  int64  
 5   Balance          10000 non-null  float64
 6   NumOfProducts    10000 non-null  int64  
 7   HasCrCard        10000 non-null  int64  
 8   IsActiveMember   10000 non-null  int64  
 9   EstimatedSalary  10000 non-null  float64
 10  Exited           10000 non-null  int64  
dtypes: float64(2), int64(7), object(2)
memory usage: 859.5+ KB
  • Observations:

  • As expected there are only 11 columns now and the 3 columns were in fact dropped from our dataset.

Exploratory Data Analysis¶

The below functions need to be defined to carry out the Exploratory Data Analysis.

In [20]:
def histogram_boxplot(data, feature, figsize=(15, 10), kde=False, bins=None):
    """
    Boxplot and histogram combined

    data: dataframe
    feature: dataframe column
    figsize: size of figure (default (15,10))
    kde: whether to show the density curve (default False)
    bins: number of bins for histogram (default None)
    """
    f2, (ax_box2, ax_hist2) = plt.subplots(
        nrows=2,  # Number of rows of the subplot grid= 2
        sharex=True,  # x-axis will be shared among all subplots
        gridspec_kw={"height_ratios": (0.25, 0.75)}, # 25% / 75% split for histogram / boxplot
        figsize=figsize,
    )  # creating the 2 subplots
    sns.boxplot(
        data=data, x=feature, ax=ax_box2, showmeans=True, color="violet"
    )  # boxplot will be created and a triangle will indicate the mean value of the column
    sns.histplot(
        data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins
    ) if bins else sns.histplot(
        data=data, x=feature, kde=kde, ax=ax_hist2
    )  # For histogram
    ax_hist2.axvline(
        data[feature].mean(), color="green", linestyle="--"
    )  # Add mean to the histogram with green line
    ax_hist2.axvline(
        data[feature].median(), color="black", linestyle="-"
    )  # Add median to the histogram with black green
In [21]:
# function to create labeled barplots


def labeled_barplot(data, feature, perc=False, n=None):
    """
    Barplot with percentage at the top

    data: dataframe
    feature: dataframe column
    perc: whether to display percentages instead of count (default is False)
    n: displays the top n category levels (default is None, i.e., display all levels)
    """

    total = len(data[feature])  # length of the column
    count = data[feature].nunique()
    if n is None:
        plt.figure(figsize=(count + 1, 5))
    else:
        plt.figure(figsize=(n + 1, 5))

    plt.xticks(rotation=90, fontsize=15)
    ax = sns.countplot(
        data=data,
        x=feature,
        palette="Paired",
        order=data[feature].value_counts().index[:n].sort_values(),
    )

    for p in ax.patches:
        if perc == True:
            label = "{:.1f}%".format(
                100 * p.get_height() / total
            )  # percentage of each class of the category
        else:
            label = p.get_height()  # count of each level of the category

        x = p.get_x() + p.get_width() / 2  # width of the plot
        y = p.get_height()  # height of the plot

        ax.annotate(
            label,
            (x, y),
            ha="center",
            va="center",
            size=12,
            xytext=(0, 5),
            textcoords="offset points",
        )  # annotate the percentage

    plt.show()  # show the plot

Univariate Analysis¶

Observations on Geography¶

In [22]:
# Calling the above function to create a barplot for 'Geography'
labeled_barplot(data, "Geography", perc=True)
No description has been provided for this image
  • Observations:

  • Half of the customers are from France. This may indicate that there are more bank branches in France. That data is not provided.

Observations on Gender¶

In [23]:
# Calling the above function to create a barplot for 'Gender'
labeled_barplot(data, "Gender", perc=True)
No description has been provided for this image
  • Observations:

  • Gender has a fairly evenl distribution with about a 10% increase in males over females with 54.6% and 45.4% respectively.

Observations on Number of Products¶

In [24]:
# Calling the above function to create a barplot for 'NumOfProducts'
labeled_barplot(data, "NumOfProducts", perc=True)
No description has been provided for this image
  • Observations:

  • Slightly over half of the customers only have 1 bank product but this is closely followed by those with 2 products at 50.8% and 45.9% respectively.

  • This with 3 and 4 bank products are a distant percentage at 2.7% and .6% respectively.

Observations on Has Credit Card¶

In [25]:
# Calling the above function to create a barplot for 'HasCrCard'
labeled_barplot(data, "HasCrCard", perc=True)
No description has been provided for this image
  • Observations:

  • The majority of customers do hold a credit card at 70.5% which is probably expected and typical.

Observations on Is Active Member¶

In [26]:
# Calling the above function to create a barplot for 'IsActiveMember'
labeled_barplot(data, "IsActiveMember", perc=True)
No description has been provided for this image
  • Observations:

  • Interestingly, Is Active Member is fairly evenly distributed. I would expect more customers to be active. This may be something we can look further in business recommendations.

Observations on Exited¶

In [27]:
# Calling the above function to create a barplot for 'Exited'
labeled_barplot(data, "Exited", perc=True)
No description has been provided for this image
  • Observations:

  • As expected, the majority of customers have not exited the bank for another bank within the past 6 months.

  • 79.6% of customers have remained a current customer.

  • The data is inbalanced as nearly 80% of customers have stayed with the bank vs. 20.4% who have left.

Observations on Credit Score¶

In [28]:
# Calling the above function to create a histogram/boxplot for 'CreditScore'
histogram_boxplot(data, "CreditScore", kde=True)
No description has been provided for this image
  • Observations:

  • Credit Score has a fairly even distribution curve.

  • There is a slight tick up at the credit score of around 850. I'll have to compare CreditScore to EstimatedIncome and check correlation later.

Observations on Age¶

In [29]:
# Calling the above function to create a histogram/boxplot for 'Age'
histogram_boxplot(data, "Age", kde=True)
No description has been provided for this image
  • Observations:

  • Age is slightly right skewed with the median age around 36.

  • There is a slight tick up at around 38-39 in age which does not seem abnormal.

  • There does seem to be some outliers beyond the upper whisper IQR limit 75% percentile

Observations on Tenure¶

In [30]:
# Calling the above function to create a histogram/boxplot for 'Tenure'
histogram_boxplot(data, "Tenure", kde=True)
No description has been provided for this image
  • Observations:

  • Tenure has a fairly even distribution from customers ranging from around 1 year to 9 years.

Observations on Balance¶

In [31]:
# Calling the above function to create a histogram/boxplot for 'Balance'
histogram_boxplot(data, "Balance", kde=True)
No description has been provided for this image
  • Observations:

  • Balance is right skewed with a large number of customers with zero balances.

  • Customers with large balances taper off around 200K.

  • Most customers have balances rsanging from 50K - ~175K.

Observations on Estimated Salary¶

In [32]:
# Calling the above function to create a histogram/boxplot for 'EstimatedSalary'
histogram_boxplot(data, "EstimatedSalary", kde=True)
No description has been provided for this image
  • Observations:

  • Estimated Salary has a fairly even distribution from customers ranging from around 0 to 200K

Bivariate Analysis - (Target: Exited)¶

The below functions need to be defined to carry out the Exploratory Data Analysis.

In [33]:
### function to plot distributions wrt target

def distribution_plot_wrt_target(data, predictor, target):

    fig, axs = plt.subplots(2, 2, figsize=(12, 10))

    target_uniq = data[target].unique()

    axs[0, 0].set_title("Distribution of target for target=" + str(target_uniq[0]))
    sns.histplot(
        data=data[data[target] == target_uniq[0]],
        x=predictor,
        kde=True,
        ax=axs[0, 0],
        color="teal",
        stat="density",
    )

    axs[0, 1].set_title("Distribution of target for target=" + str(target_uniq[1]))
    sns.histplot(
        data=data[data[target] == target_uniq[1]],
        x=predictor,
        kde=True,
        ax=axs[0, 1],
        color="orange",
        stat="density",
    )

    axs[1, 0].set_title("Boxplot w.r.t target")
    sns.boxplot(data=data, x=target, y=predictor, ax=axs[1, 0], palette="gist_rainbow")

    axs[1, 1].set_title("Boxplot (without outliers) w.r.t target")
    sns.boxplot(
        data=data,
        x=target,
        y=predictor,
        ax=axs[1, 1],
        showfliers=False,
        palette="gist_rainbow",
    )

    plt.tight_layout()
    plt.show()
In [35]:
### function for a stacked barplot chart
def stacked_barplot(data, predictor, target):
    """
    Print the category counts and plot a stacked bar chart

    data: dataframe
    predictor: independent variable
    target: target variable
    """
    count = data[predictor].nunique()
    sorter = data[target].value_counts().index[-1]
    tab1 = pd.crosstab(data[predictor], data[target], margins=True).sort_values(
        by=sorter, ascending=False
    )
    print(tab1)
    print("-" * 120)
    tab = pd.crosstab(data[predictor], data[target], normalize="index").sort_values(
        by=sorter, ascending=False
    )
    tab.plot(kind="bar", stacked=True, figsize=(count + 5, 5))
    plt.legend(
        loc="lower left", frameon=False,
    )
    plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
    plt.show()

Correlation Matrix (For Numeric Features)¶

In [36]:
# Plot a Correlation Matrix to help identify relationships between numeric variables.

# Select only numerical features for correlation analysis
numeric_data = data.select_dtypes(include=['number'])

# Compute and plot the correlation matrix
plt.figure(figsize=(10, 6))
sns.heatmap(numeric_data.corr(), annot=True, cmap="coolwarm", fmt=".2f", linewidths=0.5)
plt.title("Correlation Matrix of Numerical Features")
plt.show()
No description has been provided for this image
  • Observations:

  • Based on the above correlation matrix, I only see a couple correlations that I may want to look at in more detail below. These are age and exited with a .29 correlation and Number of Products and Balance at .3 which would seem to make sense.

Let's check some variables against our target.¶

In [37]:
# Calling the above function 'stacked_barplot' for 'Gender' and our target 'Exited'
stacked_barplot(data, "Gender", "Exited")
Exited     0     1    All
Gender                   
All     7963  2037  10000
Female  3404  1139   4543
Male    4559   898   5457
------------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
In [38]:
# Changing to CountPlots for ease of reading. 'Gender' and our target 'Exited'
sns.countplot(x='Gender', hue='Exited', data=data)
Out[38]:
<Axes: xlabel='Gender', ylabel='count'>
No description has been provided for this image
  • Observations:

  • Based on the above CountPlot, roughly 20%-50% of customers across our listed genders have left the bank over the past 6 months.

  • on a percentage basis, more females have left the bank vs. their male counterparts. We may want to include some business recommendations below to promote more programs that would be of interest for women.

In [39]:
# Changing to CountPlots for ease of reading. 'Geography' and our target 'Exited'
sns.countplot(x='Geography', hue='Exited', data=data)
Out[39]:
<Axes: xlabel='Geography', ylabel='count'>
No description has been provided for this image
  • Observations:

  • Based on the above CountPlot, Germany has a higher percentage basis of customers who have recently left the bank.

  • We may want to create some promotional programs targeting Germany.

  • Most customers are in France so France may have the more bank branches. This may indicate a center of customer loyalty.

In [40]:
# Changing to CountPlots for ease of reading. 'HasCrCard' and our target 'Exited'
sns.countplot(x='HasCrCard', hue='Exited', data=data)
Out[40]:
<Axes: xlabel='HasCrCard', ylabel='count'>
No description has been provided for this image
In [41]:
# Calling the above function 'stacked_barplot' for 'HasCrCard' and our target 'Exited'
stacked_barplot(data, "HasCrCard", "Exited")
Exited        0     1    All
HasCrCard                   
All        7963  2037  10000
1          5631  1424   7055
0          2332   613   2945
------------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
  • Observations:

  • Based on the above CountPlot, roughly 20% of customers both with a credit card and without seemed to have left the bank recently.

  • The vast majority of customers have not left, regardless of whether or not they had a bank credit card.

In [42]:
# CountPlot for 'IsActiveMember' and our target 'Exited'
sns.countplot(x='IsActiveMember', hue='Exited', data=data)
Out[42]:
<Axes: xlabel='IsActiveMember', ylabel='count'>
No description has been provided for this image
In [43]:
# Calling the above function 'stacked_barplot' for 'IsActiveMember' and our target 'Exited'
stacked_barplot(data, "IsActiveMember", "Exited")
Exited             0     1    All
IsActiveMember                   
All             7963  2037  10000
0               3547  1302   4849
1               4416   735   5151
------------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
  • Observations:

  • Based on the above charts, only ~10% of customers who are deemed active members have left the bank vs. those customers who were deemed as not active and ~25% of those customers have left the bank recently.

In [44]:
# CountPlot for 'NumOfProducts' and our target 'Exited'
sns.countplot(x='NumOfProducts', hue='Exited', data=data)
Out[44]:
<Axes: xlabel='NumOfProducts', ylabel='count'>
No description has been provided for this image
In [45]:
# Calling the above function 'stacked_barplot' for 'NumOfProducts' and our target 'Exited'
stacked_barplot(data, "NumOfProducts", "Exited")
Exited            0     1    All
NumOfProducts                   
All            7963  2037  10000
1              3675  1409   5084
2              4242   348   4590
3                46   220    266
4                 0    60     60
------------------------------------------------------------------------------------------------------------------------
No description has been provided for this image
  • Observations:

  • Interestingly, the sweet spot here seems to be customers who are using either 1 or 2 bank products and the best are those customers using 2 bank products.

  • For customers using 3 different bank products, over 80% of those customers have left the bank recently. We may want to look as this, as those customers may be confused over the bank products.

  • Lastly, 100% of customers using 4 bank products left. This may infact indicate confusion of products amongst the bank's customers. We may want to send a survey or create a reach-out program to support these customers.

In [46]:
# Calling the above function 'distribution_plot_wrt_target' for 'Age' and our target 'Exited'
distribution_plot_wrt_target(data, predictor='Age', target='Exited')
No description has been provided for this image
  • Observations:

  • The distribution of customers who have left the bank recently is fairly evenly distributed, however for those customers that did not leave, it's right skewed showing a higher concentration of customers around 30-40 years of age.

In [47]:
# Calling the above function 'distribution_plot_wrt_target' for 'Balance' and our target 'Exited'
distribution_plot_wrt_target(data, predictor='Balance', target='Exited')
No description has been provided for this image
  • Observations:

  • Customers who did not exit (Exited = 0) show a large spike at 0 balance and a fairly uniform spread beyond that.

  • Customers who exited (Exited = 1) also have a spike at 0, but the distribution is more concentrated in the mid-range (50k–150k).

  • On the bottom left boxplot, the median balance is slightly higher for customers who exited.

  • On the bottom right boxplot, you can see there is a wider account balance spread among customers who have exited.

In [48]:
# Calling the above function 'distribution_plot_wrt_target' for 'EstimatedSalary' and our target 'Exited'
distribution_plot_wrt_target(data, predictor='EstimatedSalary', target='Exited')
No description has been provided for this image
  • Observations:

  • Both exited and retained customers have a fairly uniform distribution of salaries.

  • There's no strong pattern or peak — suggesting EstimatedSalary is likely not a strong predictor of customer churn by itself.

  • There is no significant differences in spread our outliers here.

In [49]:
# Calling the above function 'distribution_plot_wrt_target' for 'CreditScore' and our target 'Exited'
distribution_plot_wrt_target(data, predictor='CreditScore', target='Exited')
No description has been provided for this image
  • Observations:

  • Both exited and retained customers have only a slight left skew to them with credit scores approximately in the 550-750 range.

  • The median credit score for both groups is very close at approximately 650, also possibly indicating that credit score alone is not a good predictor of customer churn.

In [50]:
# Calling the above function 'distribution_plot_wrt_target' for 'Tenure' and our target 'Exited'
distribution_plot_wrt_target(data, predictor='Tenure', target='Exited')
No description has been provided for this image
  • Observations:

  • Both exited and retained customers have fairly even distribution.

  • The median tenure for both groups is very close at approximately 5 years, potenitally indicating that tenure alone is also not a good predictor of customer churn.

Data Preprocessing¶

In [51]:
# Let's take another look at our data copy to ensure we did not make any errors above during EDA.
# Above we removed RowNumber, CustomerId, and Surname — not useful for modeling
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 11 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   CreditScore      10000 non-null  int64  
 1   Geography        10000 non-null  object 
 2   Gender           10000 non-null  object 
 3   Age              10000 non-null  int64  
 4   Tenure           10000 non-null  int64  
 5   Balance          10000 non-null  float64
 6   NumOfProducts    10000 non-null  int64  
 7   HasCrCard        10000 non-null  int64  
 8   IsActiveMember   10000 non-null  int64  
 9   EstimatedSalary  10000 non-null  float64
 10  Exited           10000 non-null  int64  
dtypes: float64(2), int64(7), object(2)
memory usage: 859.5+ KB

Dummy Variable Creation¶

In [52]:
# Create dummy variables from all object (categorical) columns in 'data'
data = pd.get_dummies(
    data,
    columns=data.select_dtypes(include=["object"]).columns.tolist(),
    drop_first=True
)

# Convert all columns to float (useful for modeling)
data = data.astype(float)

# Preview the result
data.head()
Out[52]:
CreditScore Age Tenure Balance NumOfProducts HasCrCard IsActiveMember EstimatedSalary Exited Geography_Germany Geography_Spain Gender_Male
0 619.0 42.0 2.0 0.00 1.0 1.0 1.0 101348.88 1.0 0.0 0.0 0.0
1 608.0 41.0 1.0 83807.86 1.0 0.0 1.0 112542.58 0.0 0.0 1.0 0.0
2 502.0 42.0 8.0 159660.80 3.0 1.0 0.0 113931.57 1.0 0.0 0.0 0.0
3 699.0 39.0 1.0 0.00 2.0 0.0 0.0 93826.63 0.0 0.0 0.0 0.0
4 850.0 43.0 2.0 125510.82 1.0 1.0 1.0 79084.10 0.0 0.0 1.0 0.0
In [53]:
# Just checking the shape of the data after encoding
data.shape
Out[53]:
(10000, 12)
In [54]:
# Just checking the data columns after encoding
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 12 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   CreditScore        10000 non-null  float64
 1   Age                10000 non-null  float64
 2   Tenure             10000 non-null  float64
 3   Balance            10000 non-null  float64
 4   NumOfProducts      10000 non-null  float64
 5   HasCrCard          10000 non-null  float64
 6   IsActiveMember     10000 non-null  float64
 7   EstimatedSalary    10000 non-null  float64
 8   Exited             10000 non-null  float64
 9   Geography_Germany  10000 non-null  float64
 10  Geography_Spain    10000 non-null  float64
 11  Gender_Male        10000 non-null  float64
dtypes: float64(12)
memory usage: 937.6 KB

Train-validation-test Split¶

In [55]:
# Split Predictors (X) and Target (y)
X = data.drop(['Exited'],axis=1)
y = data['Exited'] # Exited
In [56]:
# Splitting the dataset into the Training and Testing set.
# Setting test size to 20%

X_large, X_test, y_large, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42,stratify=y,shuffle = True)
In [57]:
# Splitting the dataset further into the Training and Validation set.
# 64% training (80% of the above 80%)
# 16% validation (20% of the above 80%)
# 20% test (already set aside above)

X_train, X_val, y_train, y_val = train_test_split(X_large, y_large, test_size = 0.2, random_state = 42,stratify=y_large, shuffle = True)
In [58]:
print(X_train.shape, X_val.shape, X_test.shape)
(6400, 11) (1600, 11) (2000, 11)
In [59]:
print(y_train.shape, y_val.shape, y_test.shape)
(6400,) (1600,) (2000,)
  • Observations:

  • The data split as expected across training/validation/test and there are 11 features.

Data Normalization¶

Note* - Since all the numerical values are on a different scale, so we will be scaling all the numerical values to bring them to the same scale.

In [60]:
# Automatically identify numeric columns with more than 2 unique values (excluding binary)
cols_list = [col for col in X_train.columns if X_train[col].nunique() > 2]

# Check which columns will be scaled
print("Columns to be scaled:", cols_list)
Columns to be scaled: ['CreditScore', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'EstimatedSalary']
In [61]:
# import StandardScaler
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()

X_train[cols_list] = sc.fit_transform(X_train[cols_list])
X_val[cols_list] = sc.transform(X_val[cols_list])
X_test[cols_list] = sc.transform(X_test[cols_list])

Model Building¶

Model Evaluation Criterion¶

A model can make wrong predictions in the following ways:

  • Predicting that a bank customer will leave and switch banks, when he/she in fact are not going to. (False Positive) Good to use F1.
  • Predicting that a bank customer will not leave and switch banks, when he/she in fact are going to. (False Negative) Good to use Recall to minimize these false negtives.

Which case is more important?

  • Both are actually important in our case.

  • In our use case today predicting bank customer churn, it would make sense to use either Recall or potentialy F1 as our model performance metric.

  • F1 could be used to strike a good balance, avoiding spamming loyal customers with retention offers, and still catch as many churners as possible.

  • I have chosen to use Recall, however, to prioritize the model on reducing false negatives and to identify every customer who may be at risk of leaving, even if we sometimes "alert" falsely.

** As we are dealing with an imbalance in class distribution, we will be using class weights to allow the model to give proportionally more importance to the minority class for the 1st 3 models and SMOTE to the last 3 models without Class Weights.**

In [62]:
# Calculate class weights for imbalanced dataset
cw = (y_train.shape[0]) / np.bincount(y_train)

# Create a dictionary mapping class indices to their respective class weights
cw_dict = {}
for i in range(cw.shape[0]):
    cw_dict[i] = cw[i]

cw_dict
Out[62]:
{0: 1.2558869701726845, 1: 4.9079754601226995}
In [63]:
# defining the batch size and # epochs upfront as we'll be using the same values for all models
epochs = 25
batch_size = 64
  • Creating a function for plotting the confusion matrix
In [64]:
def make_confusion_matrix(actual_targets, predicted_targets):
    """
    To plot the confusion_matrix with percentages

    actual_targets: actual target (dependent) variable values
    predicted_targets: predicted target (dependent) variable values
    """
    cm = confusion_matrix(actual_targets, predicted_targets)
    labels = np.asarray(
        [
            ["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
            for item in cm.flatten()
        ]
    ).reshape(cm.shape[0], cm.shape[1])

    plt.figure(figsize=(6, 4))
    sns.heatmap(cm, annot=labels, fmt="")
    plt.ylabel("True label")
    plt.xlabel("Predicted label")
  • Creating two blank dataframes that will store the recall values for all the models we build to track trsaining and validation dataset performance.
In [65]:
train_metric_df = pd.DataFrame(columns=["recall"])
valid_metric_df = pd.DataFrame(columns=["recall"])

Model 0 - Neural Network with SGD Optimizer¶

  • Let's start with a neural network consisting of
    • two hidden layers with 14 and 7 neurons respectively
    • activation function of ReLU.
    • SGD as the optimizer
In [66]:
# clears the current Keras session, resetting all layers and models previously created, freeing up memory and resources.
tf.keras.backend.clear_session()

#Fixing the seed for random number generators so that we can ensure we receive the same output everytime
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
In [67]:
#Initializing the neural network
model = Sequential()
model.add(Dense(14,activation="relu",input_dim=X_train.shape[1]))
model.add(Dense(7,activation="relu"))
model.add(Dense(1,activation="sigmoid"))
In [68]:
optimizer = tf.keras.optimizers.SGD(0.001)    # defining SGD as the optimizer to be used
metric = tf.keras.metrics.Recall()
model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=[metric])
In [69]:
model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 14)                168       
                                                                 
 dense_1 (Dense)             (None, 7)                 105       
                                                                 
 dense_2 (Dense)             (None, 1)                 8         
                                                                 
=================================================================
Total params: 281 (1.10 KB)
Trainable params: 281 (1.10 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [70]:
#Fitting the ANN

start = time.time()
history = model.fit(X_train, y_train,
                    validation_data=(X_val, y_val),
                    batch_size=batch_size,
                    epochs=epochs,
                    class_weight=cw_dict)
end = time.time()
Epoch 1/25
100/100 [==============================] - 1s 4ms/step - loss: 1.5586 - recall: 0.9931 - val_loss: 0.9099 - val_recall: 0.9908
Epoch 2/25
100/100 [==============================] - 0s 2ms/step - loss: 1.5196 - recall: 0.9808 - val_loss: 0.8709 - val_recall: 0.9663
Epoch 3/25
100/100 [==============================] - 0s 2ms/step - loss: 1.4921 - recall: 0.9494 - val_loss: 0.8414 - val_recall: 0.9387
Epoch 4/25
100/100 [==============================] - 0s 2ms/step - loss: 1.4718 - recall: 0.9110 - val_loss: 0.8182 - val_recall: 0.9080
Epoch 5/25
100/100 [==============================] - 0s 2ms/step - loss: 1.4563 - recall: 0.8827 - val_loss: 0.7998 - val_recall: 0.8620
Epoch 6/25
100/100 [==============================] - 0s 2ms/step - loss: 1.4443 - recall: 0.8374 - val_loss: 0.7849 - val_recall: 0.8190
Epoch 7/25
100/100 [==============================] - 0s 2ms/step - loss: 1.4348 - recall: 0.7860 - val_loss: 0.7726 - val_recall: 0.7914
Epoch 8/25
100/100 [==============================] - 0s 2ms/step - loss: 1.4271 - recall: 0.7469 - val_loss: 0.7622 - val_recall: 0.7577
Epoch 9/25
100/100 [==============================] - 0s 2ms/step - loss: 1.4206 - recall: 0.7216 - val_loss: 0.7534 - val_recall: 0.7117
Epoch 10/25
100/100 [==============================] - 0s 2ms/step - loss: 1.4153 - recall: 0.6771 - val_loss: 0.7459 - val_recall: 0.6779
Epoch 11/25
100/100 [==============================] - 0s 2ms/step - loss: 1.4107 - recall: 0.6549 - val_loss: 0.7395 - val_recall: 0.6564
Epoch 12/25
100/100 [==============================] - 0s 2ms/step - loss: 1.4068 - recall: 0.6273 - val_loss: 0.7339 - val_recall: 0.6350
Epoch 13/25
100/100 [==============================] - 0s 2ms/step - loss: 1.4033 - recall: 0.6081 - val_loss: 0.7289 - val_recall: 0.6135
Epoch 14/25
100/100 [==============================] - 0s 2ms/step - loss: 1.4003 - recall: 0.5867 - val_loss: 0.7245 - val_recall: 0.6043
Epoch 15/25
100/100 [==============================] - 0s 2ms/step - loss: 1.3975 - recall: 0.5637 - val_loss: 0.7206 - val_recall: 0.5859
Epoch 16/25
100/100 [==============================] - 0s 2ms/step - loss: 1.3950 - recall: 0.5445 - val_loss: 0.7172 - val_recall: 0.5706
Epoch 17/25
100/100 [==============================] - 0s 2ms/step - loss: 1.3927 - recall: 0.5345 - val_loss: 0.7141 - val_recall: 0.5368
Epoch 18/25
100/100 [==============================] - 0s 2ms/step - loss: 1.3905 - recall: 0.5169 - val_loss: 0.7114 - val_recall: 0.5153
Epoch 19/25
100/100 [==============================] - 0s 2ms/step - loss: 1.3886 - recall: 0.5130 - val_loss: 0.7089 - val_recall: 0.5031
Epoch 20/25
100/100 [==============================] - 0s 2ms/step - loss: 1.3867 - recall: 0.4962 - val_loss: 0.7067 - val_recall: 0.5000
Epoch 21/25
100/100 [==============================] - 0s 2ms/step - loss: 1.3850 - recall: 0.4877 - val_loss: 0.7047 - val_recall: 0.4969
Epoch 22/25
100/100 [==============================] - 0s 2ms/step - loss: 1.3834 - recall: 0.4923 - val_loss: 0.7029 - val_recall: 0.4877
Epoch 23/25
100/100 [==============================] - 0s 2ms/step - loss: 1.3818 - recall: 0.4847 - val_loss: 0.7013 - val_recall: 0.4877
Epoch 24/25
100/100 [==============================] - 0s 2ms/step - loss: 1.3802 - recall: 0.4801 - val_loss: 0.6998 - val_recall: 0.4908
Epoch 25/25
100/100 [==============================] - 0s 2ms/step - loss: 1.3788 - recall: 0.4793 - val_loss: 0.6985 - val_recall: 0.4877
In [71]:
print("Time taken in seconds ",end-start)
Time taken in seconds  6.09728479385376

Loss Function

In [72]:
#Plotting Train Loss vs Validation Loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
No description has been provided for this image

Recall

In [73]:
#Plotting Train recall vs Validation recall
plt.plot(history.history['recall'])
plt.plot(history.history['val_recall'])
plt.title('model recall')
plt.ylabel('Recall')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
No description has been provided for this image
In [74]:
#Predicting the results using best as a threshold on training set
y_train_pred = model.predict(X_train)
y_train_pred = (y_train_pred > 0.5)
y_train_pred
200/200 [==============================] - 0s 1ms/step
Out[74]:
array([[ True],
       [ True],
       [ True],
       ...,
       [False],
       [ True],
       [ True]])
In [75]:
#Predicting the results using best as a threshold
y_val_pred = model.predict(X_val)
y_val_pred = (y_val_pred > 0.5)
y_val_pred
50/50 [==============================] - 0s 1ms/step
Out[75]:
array([[False],
       [ True],
       [ True],
       ...,
       [ True],
       [False],
       [False]])
In [76]:
model_name = "NN with SGD"

train_metric_df.loc[model_name] = recall_score(y_train, y_train_pred)
valid_metric_df.loc[model_name] = recall_score(y_val, y_val_pred)

Classification Report

In [77]:
#Classification report on training set
cr = classification_report(y_train, y_train_pred)
print(cr)
              precision    recall  f1-score   support

         0.0       0.81      0.58      0.68      5096
         1.0       0.23      0.48      0.31      1304

    accuracy                           0.56      6400
   macro avg       0.52      0.53      0.49      6400
weighted avg       0.69      0.56      0.60      6400

In [78]:
#classification report on validation set
cr=classification_report(y_val,y_val_pred)
print(cr)
              precision    recall  f1-score   support

         0.0       0.81      0.56      0.66      1274
         1.0       0.22      0.49      0.30       326

    accuracy                           0.55      1600
   macro avg       0.52      0.52      0.48      1600
weighted avg       0.69      0.55      0.59      1600

In [81]:
# Generate a Classification Heatmap
report = classification_report(y_val, y_val_pred, output_dict=True)
df_report = pd.DataFrame(report).transpose()

plt.figure(figsize=(8, 5))
sns.heatmap(df_report.iloc[:-1, :-1], annot=True, cmap="YlGnBu", fmt=".2f")
plt.title("Classification Report Heatmap")
plt.show()
No description has been provided for this image
In [84]:
# Generate ROC Curve to show trade-off between True Positive Rate and False Positive Rate.

y_scores = model.predict(X_val).ravel()
fpr, tpr, thresholds = roc_curve(y_val, y_scores)
roc_auc = auc(fpr, tpr)

plt.figure()
plt.plot(fpr, tpr, label=f"AUC = {roc_auc:.2f}")
plt.plot([0, 1], [0, 1], linestyle='--')
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.legend()
plt.grid(True)
plt.show()
50/50 [==============================] - 0s 4ms/step
No description has been provided for this image

Focus on the Business Goal: We're predicting customer churn (1.0 = exited) — so catching as many churners as possible is key.

That makes Recall for class 1.0 (churners) the most important metric to watch.

For the Validation Set:

  • Class 0.0 (Stayed): Precision=0.81, Recall=0.56

  • Class 1.0 (Exited): Precision=0.22, Recall=0.49

  • an AUC of 0.52 a signal that our current model is just slightly better than random guessing, which isn't strong enough for reliable churn prediction yet.

  • The model correctly identifies 49% of churners.

  • This model catches just under half of the churners, which is useful if the business can follow up with targeted retention campaigns.

  • Our model can currently identify ~49% of customers likely to churn. While it may incorrectly flag some customers who would stay, this recall level allows proactive retention efforts, such as targeted offers or outreach campaigns, to focus on a meaningful subset of at-risk customers.

Confusion matrix

In [79]:
#Calculating the confusion matrix on training set
make_confusion_matrix(y_train, y_train_pred)
No description has been provided for this image
In [80]:
#Calculating the confusion matrix v
make_confusion_matrix(y_val,y_val_pred)
No description has been provided for this image

** Observations **

  • Out of all customers that actually churned, we were able to identify 49% of them based on Recall. While our predictions also flagged some loyal customers incorrectly (false positives), this trade-off may be acceptable when the cost of losing a customer is high and retention efforts are relatively inexpensive. - Recall (for churners) = TP / (TP + FN) = 159 / (159 + 167) ≈ 0.49

Model Performance Improvement¶

Model 1 - Neural Network with Adam Optimizer¶

- Now let's switch to a NN model using Adam Optimizer

  • two hidden layers with 14 and 7 neurons respectively
  • activation function of ReLU.
  • Adam as the optimizer
In [85]:
# clears the current Keras session, resetting all layers and models previously created, freeing up memory and resources.
tf.keras.backend.clear_session()

#Fixing the seed for random number generators so that we can ensure we receive the same output everytime
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
In [86]:
#Initializing the neural network
model_1 = Sequential()
model_1.add(Dense(14,activation="relu",input_dim=X_train.shape[1]))
model_1.add(Dense(7,activation="relu"))
model_1.add(Dense(1,activation="sigmoid"))
In [87]:
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)    # defining Adam as the optimizer to be used
metric = tf.keras.metrics.Recall()
model_1.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=[metric])
In [88]:
model_1.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 14)                168       
                                                                 
 dense_1 (Dense)             (None, 7)                 105       
                                                                 
 dense_2 (Dense)             (None, 1)                 8         
                                                                 
=================================================================
Total params: 281 (1.10 KB)
Trainable params: 281 (1.10 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [89]:
#Fitting the ANN

start = time.time()
history_1 = model_1.fit(X_train, y_train,
                    validation_data=(X_val, y_val),
                    batch_size=batch_size,
                    epochs=epochs,
                    class_weight=cw_dict)
end = time.time()
Epoch 1/25
100/100 [==============================] - 1s 5ms/step - loss: 1.4336 - recall: 0.6143 - val_loss: 0.6760 - val_recall: 0.6350
Epoch 2/25
100/100 [==============================] - 0s 2ms/step - loss: 1.2864 - recall: 0.6787 - val_loss: 0.6083 - val_recall: 0.6472
Epoch 3/25
100/100 [==============================] - 0s 2ms/step - loss: 1.2037 - recall: 0.6787 - val_loss: 0.6035 - val_recall: 0.7117
Epoch 4/25
100/100 [==============================] - 0s 2ms/step - loss: 1.1468 - recall: 0.7132 - val_loss: 0.5716 - val_recall: 0.6871
Epoch 5/25
100/100 [==============================] - 0s 2ms/step - loss: 1.1090 - recall: 0.7370 - val_loss: 0.5403 - val_recall: 0.6718
Epoch 6/25
100/100 [==============================] - 0s 2ms/step - loss: 1.0787 - recall: 0.7339 - val_loss: 0.5337 - val_recall: 0.6963
Epoch 7/25
100/100 [==============================] - 0s 2ms/step - loss: 1.0518 - recall: 0.7523 - val_loss: 0.5372 - val_recall: 0.7423
Epoch 8/25
100/100 [==============================] - 0s 2ms/step - loss: 1.0294 - recall: 0.7592 - val_loss: 0.5281 - val_recall: 0.7454
Epoch 9/25
100/100 [==============================] - 0s 2ms/step - loss: 1.0119 - recall: 0.7653 - val_loss: 0.5039 - val_recall: 0.7239
Epoch 10/25
100/100 [==============================] - 0s 2ms/step - loss: 0.9950 - recall: 0.7554 - val_loss: 0.5424 - val_recall: 0.7914
Epoch 11/25
100/100 [==============================] - 0s 3ms/step - loss: 0.9839 - recall: 0.7646 - val_loss: 0.5067 - val_recall: 0.7607
Epoch 12/25
100/100 [==============================] - 0s 3ms/step - loss: 0.9791 - recall: 0.7699 - val_loss: 0.4945 - val_recall: 0.7515
Epoch 13/25
100/100 [==============================] - 0s 3ms/step - loss: 0.9665 - recall: 0.7722 - val_loss: 0.4937 - val_recall: 0.7607
Epoch 14/25
100/100 [==============================] - 0s 3ms/step - loss: 0.9583 - recall: 0.7753 - val_loss: 0.4710 - val_recall: 0.7423
Epoch 15/25
100/100 [==============================] - 0s 3ms/step - loss: 0.9555 - recall: 0.7745 - val_loss: 0.4845 - val_recall: 0.7546
Epoch 16/25
100/100 [==============================] - 0s 3ms/step - loss: 0.9493 - recall: 0.7715 - val_loss: 0.4823 - val_recall: 0.7607
Epoch 17/25
100/100 [==============================] - 0s 3ms/step - loss: 0.9460 - recall: 0.7730 - val_loss: 0.4794 - val_recall: 0.7638
Epoch 18/25
100/100 [==============================] - 0s 3ms/step - loss: 0.9416 - recall: 0.7730 - val_loss: 0.4883 - val_recall: 0.7638
Epoch 19/25
100/100 [==============================] - 0s 3ms/step - loss: 0.9391 - recall: 0.7623 - val_loss: 0.4677 - val_recall: 0.7423
Epoch 20/25
100/100 [==============================] - 0s 3ms/step - loss: 0.9376 - recall: 0.7707 - val_loss: 0.4922 - val_recall: 0.7791
Epoch 21/25
100/100 [==============================] - 0s 3ms/step - loss: 0.9344 - recall: 0.7707 - val_loss: 0.4760 - val_recall: 0.7546
Epoch 22/25
100/100 [==============================] - 0s 2ms/step - loss: 0.9327 - recall: 0.7722 - val_loss: 0.4631 - val_recall: 0.7454
Epoch 23/25
100/100 [==============================] - 0s 2ms/step - loss: 0.9308 - recall: 0.7692 - val_loss: 0.4762 - val_recall: 0.7515
Epoch 24/25
100/100 [==============================] - 0s 2ms/step - loss: 0.9294 - recall: 0.7730 - val_loss: 0.4682 - val_recall: 0.7393
Epoch 25/25
100/100 [==============================] - 0s 2ms/step - loss: 0.9278 - recall: 0.7638 - val_loss: 0.4738 - val_recall: 0.7577
In [90]:
print("Time taken in seconds ",end-start)
Time taken in seconds  10.908873319625854

Loss Function

In [91]:
#Plotting Train Loss vs Validation Loss
plt.plot(history_1.history['loss'])
plt.plot(history_1.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
No description has been provided for this image

Recall

In [92]:
#Plotting Train recall vs Validation recall
plt.plot(history_1.history['recall'])
plt.plot(history_1.history['val_recall'])
plt.title('model recall')
plt.ylabel('Recall')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
No description has been provided for this image
In [93]:
#Predicting the results using 0.5 as the threshold for training set
y_train_pred = model_1.predict(X_train)
y_train_pred = (y_train_pred > 0.5)
y_train_pred
200/200 [==============================] - 0s 2ms/step
Out[93]:
array([[ True],
       [False],
       [False],
       ...,
       [False],
       [ True],
       [False]])
In [94]:
#Predicting the results using 0.5 as the threshold for validation set
y_val_pred = model_1.predict(X_val)
y_val_pred = (y_val_pred > 0.5)
y_val_pred
50/50 [==============================] - 0s 1ms/step
Out[94]:
array([[ True],
       [False],
       [False],
       ...,
       [False],
       [ True],
       [ True]])
In [95]:
model_name = "NN with Adam"

train_metric_df.loc[model_name] = recall_score(y_train, y_train_pred)
valid_metric_df.loc[model_name] = recall_score(y_val, y_val_pred)

Classification Report

In [96]:
#Classification report on training set
cr = classification_report(y_train, y_train_pred)
print(cr)
              precision    recall  f1-score   support

         0.0       0.93      0.78      0.85      5096
         1.0       0.48      0.77      0.59      1304

    accuracy                           0.78      6400
   macro avg       0.70      0.78      0.72      6400
weighted avg       0.84      0.78      0.80      6400

In [97]:
#classification report on validation set
cr=classification_report(y_val,y_val_pred)
print(cr)
              precision    recall  f1-score   support

         0.0       0.93      0.77      0.84      1274
         1.0       0.46      0.76      0.57       326

    accuracy                           0.77      1600
   macro avg       0.69      0.76      0.71      1600
weighted avg       0.83      0.77      0.79      1600

In [98]:
# Generate a Classification Heatmap
report = classification_report(y_val, y_val_pred, output_dict=True)
df_report = pd.DataFrame(report).transpose()

plt.figure(figsize=(8, 5))
sns.heatmap(df_report.iloc[:-1, :-1], annot=True, cmap="YlGnBu", fmt=".2f")
plt.title("Classification Report Heatmap")
plt.show()
No description has been provided for this image
In [99]:
# Generate ROC Curve to show trade-off between True Positive Rate and False Positive Rate.

y_scores = model_1.predict(X_val).ravel()
fpr, tpr, thresholds = roc_curve(y_val, y_scores)
roc_auc = auc(fpr, tpr)

plt.figure()
plt.plot(fpr, tpr, label=f"AUC = {roc_auc:.2f}")
plt.plot([0, 1], [0, 1], linestyle='--')
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.legend()
plt.grid(True)
plt.show()
50/50 [==============================] - 0s 2ms/step
No description has been provided for this image

Focus on the Business Goal: We're predicting customer churn (1.0 = exited) — so catching as many churners as possible is key.

That makes Recall for class 1.0 (churners) the most important metric to watch.

This NN model using Adam Optimizer is showing greatly improved results for Recall.

For the Validation Set:

  • Class 0.0 (Stayed): Precision=0.93, Recall=0.77

  • Class 1.0 (Exited): Precision=0.46, Recall=0.76

  • an AUC of 0.85 is showing great improvement in distinguishing between churners and non-churners

  • The model correctly identifies 76% of churners.

  • This model now catches over 3/4 of the churners, which is useful if the business can follow up with targeted retention campaigns.

  • Our model can currently identify ~76% of customers likely to churn. While it may incorrectly flag some customers who would stay, this recall level allows proactive retention efforts, such as targeted offers or outreach campaigns, to focus on a meaningful subset of at-risk customers.

Confusion matrix

In [100]:
#Calculating the confusion matrix on training set
make_confusion_matrix(y_train, y_train_pred)
No description has been provided for this image
In [102]:
#Calculating the confusion matrix on validation set
make_confusion_matrix(y_val,y_val_pred)
No description has been provided for this image

** Observations **

  • Out of all customers that actually churned, we were able to identify 76% of them based on Recall. While our predictions also flagged some loyal customers incorrectly (false positives), this trade-off may be acceptable when the cost of losing a customer is high and retention efforts are relatively inexpensive. - Recall (for churners) = TP / (TP + FN) = 247 / (247 + 79) ≈ 0.76

Model 2 - Neural Network with Adam Optimizer and Dropout¶

- Now let's switch to a NN model using Adam Optimizer and Dropout

  • one input layer with 32 neurons
  • three hidden layers with 20 and 14 and 7 neurons respectively
  • activation function of ReLU.
  • Adam as the optimizer with dropout
In [103]:
# clears the current Keras session, resetting all layers and models previously created, freeing up memory and resources.
tf.keras.backend.clear_session()

#Fixing the seed for random number generators so that we can ensure we receive the same output everytime
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
In [104]:
# Initializing the neural network
model_2 = Sequential()

# Input layer + Dropout
model_2.add(Dense(32, activation="relu", input_dim=X_train.shape[1]))
model_2.add(Dropout(0.3))

# First hidden layer + Dropout
model_2.add(Dense(20, activation="relu"))
model_2.add(Dropout(0.2))

# Second hidden layer + Dropout
model_2.add(Dense(14, activation="relu"))
model_2.add(Dropout(0.1))

# Third hidden layer (no Dropout here is fine too)
model_2.add(Dense(7, activation="relu"))

# Output layer
model_2.add(Dense(1, activation="sigmoid"))
In [105]:
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)    # defining Adam as the optimizer to be used
metric = tf.keras.metrics.Recall()
model_2.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=[metric])
In [106]:
model_2.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 32)                384       
                                                                 
 dropout (Dropout)           (None, 32)                0         
                                                                 
 dense_1 (Dense)             (None, 20)                660       
                                                                 
 dropout_1 (Dropout)         (None, 20)                0         
                                                                 
 dense_2 (Dense)             (None, 14)                294       
                                                                 
 dropout_2 (Dropout)         (None, 14)                0         
                                                                 
 dense_3 (Dense)             (None, 7)                 105       
                                                                 
 dense_4 (Dense)             (None, 1)                 8         
                                                                 
=================================================================
Total params: 1451 (5.67 KB)
Trainable params: 1451 (5.67 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [107]:
#Fitting the ANN

start = time.time()
history_2 = model_2.fit(X_train, y_train,
                    validation_data=(X_val, y_val),
                    batch_size=batch_size,
                    epochs=epochs,
                    class_weight=cw_dict)
end = time.time()
Epoch 1/25
100/100 [==============================] - 1s 4ms/step - loss: 1.3649 - recall: 0.6035 - val_loss: 0.6372 - val_recall: 0.6135
Epoch 2/25
100/100 [==============================] - 0s 2ms/step - loss: 1.2804 - recall: 0.6411 - val_loss: 0.5708 - val_recall: 0.6779
Epoch 3/25
100/100 [==============================] - 0s 3ms/step - loss: 1.2196 - recall: 0.6580 - val_loss: 0.5737 - val_recall: 0.7546
Epoch 4/25
100/100 [==============================] - 0s 2ms/step - loss: 1.1805 - recall: 0.7094 - val_loss: 0.5532 - val_recall: 0.7485
Epoch 5/25
100/100 [==============================] - 0s 3ms/step - loss: 1.1639 - recall: 0.7170 - val_loss: 0.5247 - val_recall: 0.7393
Epoch 6/25
100/100 [==============================] - 0s 2ms/step - loss: 1.1294 - recall: 0.7224 - val_loss: 0.5184 - val_recall: 0.7669
Epoch 7/25
100/100 [==============================] - 0s 3ms/step - loss: 1.1152 - recall: 0.7423 - val_loss: 0.5332 - val_recall: 0.7945
Epoch 8/25
100/100 [==============================] - 0s 3ms/step - loss: 1.0919 - recall: 0.7561 - val_loss: 0.5175 - val_recall: 0.7883
Epoch 9/25
100/100 [==============================] - 0s 2ms/step - loss: 1.0805 - recall: 0.7538 - val_loss: 0.4933 - val_recall: 0.7699
Epoch 10/25
100/100 [==============================] - 0s 2ms/step - loss: 1.0667 - recall: 0.7324 - val_loss: 0.5287 - val_recall: 0.8006
Epoch 11/25
100/100 [==============================] - 0s 3ms/step - loss: 1.0685 - recall: 0.7477 - val_loss: 0.4924 - val_recall: 0.7669
Epoch 12/25
100/100 [==============================] - 0s 3ms/step - loss: 1.0486 - recall: 0.7485 - val_loss: 0.5047 - val_recall: 0.7975
Epoch 13/25
100/100 [==============================] - 0s 2ms/step - loss: 1.0420 - recall: 0.7646 - val_loss: 0.4861 - val_recall: 0.7638
Epoch 14/25
100/100 [==============================] - 0s 3ms/step - loss: 1.0363 - recall: 0.7600 - val_loss: 0.4871 - val_recall: 0.7607
Epoch 15/25
100/100 [==============================] - 0s 2ms/step - loss: 1.0268 - recall: 0.7607 - val_loss: 0.4928 - val_recall: 0.7761
Epoch 16/25
100/100 [==============================] - 0s 2ms/step - loss: 1.0169 - recall: 0.7462 - val_loss: 0.4787 - val_recall: 0.7546
Epoch 17/25
100/100 [==============================] - 0s 3ms/step - loss: 1.0088 - recall: 0.7569 - val_loss: 0.4826 - val_recall: 0.7607
Epoch 18/25
100/100 [==============================] - 0s 2ms/step - loss: 1.0170 - recall: 0.7577 - val_loss: 0.5145 - val_recall: 0.8067
Epoch 19/25
100/100 [==============================] - 0s 3ms/step - loss: 0.9953 - recall: 0.7653 - val_loss: 0.4841 - val_recall: 0.7730
Epoch 20/25
100/100 [==============================] - 0s 2ms/step - loss: 1.0037 - recall: 0.7577 - val_loss: 0.4899 - val_recall: 0.7791
Epoch 21/25
100/100 [==============================] - 0s 3ms/step - loss: 1.0067 - recall: 0.7600 - val_loss: 0.4893 - val_recall: 0.7761
Epoch 22/25
100/100 [==============================] - 0s 2ms/step - loss: 0.9996 - recall: 0.7646 - val_loss: 0.4654 - val_recall: 0.7423
Epoch 23/25
100/100 [==============================] - 0s 3ms/step - loss: 0.9929 - recall: 0.7569 - val_loss: 0.4830 - val_recall: 0.7515
Epoch 24/25
100/100 [==============================] - 0s 2ms/step - loss: 0.9929 - recall: 0.7638 - val_loss: 0.4792 - val_recall: 0.7607
Epoch 25/25
100/100 [==============================] - 0s 3ms/step - loss: 0.9862 - recall: 0.7569 - val_loss: 0.4979 - val_recall: 0.8037
In [108]:
print("Time taken in seconds ",end-start)
Time taken in seconds  11.215099096298218

Loss Function

In [109]:
#Plotting Train Loss vs Validation Loss
plt.plot(history_2.history['loss'])
plt.plot(history_2.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
No description has been provided for this image

Recall

In [110]:
#Plotting Train recall vs Validation recall
plt.plot(history_2.history['recall'])
plt.plot(history_2.history['val_recall'])
plt.title('model recall')
plt.ylabel('Recall')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
No description has been provided for this image
  • Observations:

  • Both train and validation recall increase over time indicating our model is learning how to detect churners effectively.

  • Validation recall is consistently higher than training recall which is a bit rare but not necessarily bad. It could be because of the class imbalance or the use of class weights.

  • The model is generalizing well and maintains high recall on unseen data — which is good for churn detection, where missing a churner is more costly than a false alarm.

  • Validation recall hovers around 0.80+, which is strong.

In [111]:
#Predicting the results using 0.5 as the threshold for training set
y_train_pred = model_2.predict(X_train)
y_train_pred = (y_train_pred > 0.5)
y_train_pred
200/200 [==============================] - 0s 1ms/step
Out[111]:
array([[ True],
       [False],
       [False],
       ...,
       [False],
       [ True],
       [False]])
In [112]:
#Predicting the results using 0.5 as the threshold for validation set
y_val_pred = model_2.predict(X_val)
y_val_pred = (y_val_pred > 0.5)
y_val_pred
50/50 [==============================] - 0s 1ms/step
Out[112]:
array([[ True],
       [False],
       [False],
       ...,
       [False],
       [ True],
       [ True]])
In [113]:
model_name = "NN with Adam and Dropout"

train_metric_df.loc[model_name] = recall_score(y_train, y_train_pred)
valid_metric_df.loc[model_name] = recall_score(y_val, y_val_pred)

Classification Report

In [114]:
#Classification report on training set
cr = classification_report(y_train, y_train_pred)
print(cr)
              precision    recall  f1-score   support

         0.0       0.94      0.75      0.83      5096
         1.0       0.45      0.80      0.58      1304

    accuracy                           0.76      6400
   macro avg       0.69      0.78      0.70      6400
weighted avg       0.84      0.76      0.78      6400

In [115]:
#classification report on validation set
cr=classification_report(y_val,y_val_pred)
print(cr)
              precision    recall  f1-score   support

         0.0       0.94      0.74      0.82      1274
         1.0       0.44      0.80      0.57       326

    accuracy                           0.75      1600
   macro avg       0.69      0.77      0.70      1600
weighted avg       0.83      0.75      0.77      1600

In [116]:
# Generate a Classification Heatmap
report = classification_report(y_val, y_val_pred, output_dict=True)
df_report = pd.DataFrame(report).transpose()

plt.figure(figsize=(8, 5))
sns.heatmap(df_report.iloc[:-1, :-1], annot=True, cmap="YlGnBu", fmt=".2f")
plt.title("Classification Report Heatmap")
plt.show()
No description has been provided for this image
In [117]:
# Generate ROC Curve to show trade-off between True Positive Rate and False Positive Rate.

y_scores = model_2.predict(X_val).ravel()
fpr, tpr, thresholds = roc_curve(y_val, y_scores)
roc_auc = auc(fpr, tpr)

plt.figure()
plt.plot(fpr, tpr, label=f"AUC = {roc_auc:.2f}")
plt.plot([0, 1], [0, 1], linestyle='--')
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.legend()
plt.grid(True)
plt.show()
50/50 [==============================] - 0s 1ms/step
No description has been provided for this image

Focus on the Business Goal: We're predicting customer churn (1.0 = exited) — so catching as many churners as possible is key.

That makes Recall for class 1.0 (churners) the most important metric to watch.

This NN model using **Adam Optimizer and Dropout is showing improved results for Recall.**

For the Validation Set:

  • Class 0.0 (Stayed): Precision=0.94, Recall=0.74

  • Class 1.0 (Exited): Precision=0.44, Recall=0.80

  • an AUC of 0.84 is showing improvement in distinguishing between churners and non-churners from the original model.

  • The model correctly identifies 80% of churners.

  • This model now catches well over 3/4 of the churners, which is useful if the business can follow up with targeted retention campaigns.

  • Our model can currently identify ~80% of customers likely to churn. While it may incorrectly flag some customers who would stay, this recall level allows proactive retention efforts, such as targeted offers or outreach campaigns, to focus on a meaningful subset of at-risk customers.

Confusion matrix

In [118]:
#Calculating the confusion matrix on training set
make_confusion_matrix(y_train, y_train_pred)
No description has been provided for this image
In [119]:
#Calculating the confusion matrix on validation set
make_confusion_matrix(y_val,y_val_pred)
No description has been provided for this image

** Observations **

  • Out of all customers that actually churned, we were able to identify 80% of them based on Recall. While our predictions also flagged some loyal customers incorrectly (false positives), this trade-off may be acceptable when the cost of losing a customer is high and retention efforts are relatively inexpensive. - Recall (for churners) = TP / (TP + FN) = 262 / (262 + 64) ≈ 0.80
  • We're maintaining strong generalization and avoiding overfitting while maximizing churn recall.

Model 3 - Neural Network with Balanced Data (by applying SMOTE) and SGD Optimizer¶

- Now let's switch to a NN model with Balanced Data (by applying SMOTE) and SGD Optimizer

  • one input layer with 32 neurons
  • two hidden layers with 20 and 14 neurons respectively
  • activation function of ReLU.

Let's apply SMOTE to balance this dataset and then again apply hyperparamter tuning accordingly.

In [138]:
sm = SMOTE(random_state=42)
# Fit SMOTE on the training data and create balanced versions
X_train_smote, y_train_smote = sm.fit_resample(X_train, y_train)

print('After UpSampling, the shape of train_X: {}'.format(X_train_smote.shape))
print('After UpSampling, the shape of train_y: {} \n'.format(y_train_smote.shape))
After UpSampling, the shape of train_X: (10192, 11)
After UpSampling, the shape of train_y: (10192,) 

Let's build a model with the balanced dataset

In [139]:
# clears the current Keras session, resetting all layers and models previously created, freeing up memory and resources.
tf.keras.backend.clear_session()

#Fixing the seed for random number generators so that we can ensure we receive the same output everytime
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
In [140]:
# Initializing the neural network
model_3 = Sequential()

# Input layer
model_3.add(Dense(32, activation="relu", input_dim=X_train_smote.shape[1]))

# First hidden layer
model_3.add(Dense(20, activation="relu"))

# Second hidden layer
model_3.add(Dense(14, activation="relu"))

# Output layer
model_3.add(Dense(1, activation="sigmoid"))
In [141]:
optimizer = tf.keras.optimizers.SGD(learning_rate=0.001)    # defining SGD as the optimizer to be used
metric = tf.keras.metrics.Recall()
model_3.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=[metric])
In [142]:
model_3.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 32)                384       
                                                                 
 dense_1 (Dense)             (None, 20)                660       
                                                                 
 dense_2 (Dense)             (None, 14)                294       
                                                                 
 dense_3 (Dense)             (None, 1)                 15        
                                                                 
=================================================================
Total params: 1353 (5.29 KB)
Trainable params: 1353 (5.29 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [143]:
# Fitting the ANN without class_weight since SMOTE is already applied
start = time.time()
history_3 = model_3.fit(X_train_smote, y_train_smote,
                        validation_data=(X_val, y_val),
                        batch_size=batch_size,
                        epochs=epochs)
end = time.time()
Epoch 1/25
160/160 [==============================] - 1s 3ms/step - loss: 0.6907 - recall: 0.0850 - val_loss: 0.6396 - val_recall: 0.0859
Epoch 2/25
160/160 [==============================] - 0s 2ms/step - loss: 0.6876 - recall: 0.1303 - val_loss: 0.6407 - val_recall: 0.1227
Epoch 3/25
160/160 [==============================] - 0s 2ms/step - loss: 0.6847 - recall: 0.1890 - val_loss: 0.6410 - val_recall: 0.1748
Epoch 4/25
160/160 [==============================] - 0s 2ms/step - loss: 0.6820 - recall: 0.2288 - val_loss: 0.6408 - val_recall: 0.2270
Epoch 5/25
160/160 [==============================] - 0s 2ms/step - loss: 0.6795 - recall: 0.2682 - val_loss: 0.6403 - val_recall: 0.2761
Epoch 6/25
160/160 [==============================] - 0s 2ms/step - loss: 0.6770 - recall: 0.3104 - val_loss: 0.6392 - val_recall: 0.3037
Epoch 7/25
160/160 [==============================] - 0s 2ms/step - loss: 0.6746 - recall: 0.3411 - val_loss: 0.6380 - val_recall: 0.3282
Epoch 8/25
160/160 [==============================] - 0s 2ms/step - loss: 0.6722 - recall: 0.3787 - val_loss: 0.6365 - val_recall: 0.3466
Epoch 9/25
160/160 [==============================] - 0s 2ms/step - loss: 0.6699 - recall: 0.4029 - val_loss: 0.6346 - val_recall: 0.3650
Epoch 10/25
160/160 [==============================] - 0s 2ms/step - loss: 0.6675 - recall: 0.4223 - val_loss: 0.6325 - val_recall: 0.3712
Epoch 11/25
160/160 [==============================] - 0s 2ms/step - loss: 0.6651 - recall: 0.4368 - val_loss: 0.6304 - val_recall: 0.3804
Epoch 12/25
160/160 [==============================] - 0s 2ms/step - loss: 0.6627 - recall: 0.4570 - val_loss: 0.6279 - val_recall: 0.3804
Epoch 13/25
160/160 [==============================] - 0s 2ms/step - loss: 0.6603 - recall: 0.4674 - val_loss: 0.6257 - val_recall: 0.4049
Epoch 14/25
160/160 [==============================] - 0s 2ms/step - loss: 0.6578 - recall: 0.4853 - val_loss: 0.6230 - val_recall: 0.4233
Epoch 15/25
160/160 [==============================] - 0s 2ms/step - loss: 0.6553 - recall: 0.4957 - val_loss: 0.6202 - val_recall: 0.4387
Epoch 16/25
160/160 [==============================] - 0s 2ms/step - loss: 0.6528 - recall: 0.5094 - val_loss: 0.6175 - val_recall: 0.4448
Epoch 17/25
160/160 [==============================] - 0s 3ms/step - loss: 0.6503 - recall: 0.5186 - val_loss: 0.6148 - val_recall: 0.4509
Epoch 18/25
160/160 [==============================] - 0s 3ms/step - loss: 0.6477 - recall: 0.5312 - val_loss: 0.6116 - val_recall: 0.4540
Epoch 19/25
160/160 [==============================] - 0s 2ms/step - loss: 0.6451 - recall: 0.5398 - val_loss: 0.6086 - val_recall: 0.4663
Epoch 20/25
160/160 [==============================] - 0s 2ms/step - loss: 0.6425 - recall: 0.5410 - val_loss: 0.6058 - val_recall: 0.4816
Epoch 21/25
160/160 [==============================] - 0s 3ms/step - loss: 0.6398 - recall: 0.5500 - val_loss: 0.6028 - val_recall: 0.4877
Epoch 22/25
160/160 [==============================] - 0s 2ms/step - loss: 0.6371 - recall: 0.5595 - val_loss: 0.5997 - val_recall: 0.5031
Epoch 23/25
160/160 [==============================] - 0s 2ms/step - loss: 0.6343 - recall: 0.5626 - val_loss: 0.5973 - val_recall: 0.5123
Epoch 24/25
160/160 [==============================] - 0s 2ms/step - loss: 0.6316 - recall: 0.5742 - val_loss: 0.5944 - val_recall: 0.5215
Epoch 25/25
160/160 [==============================] - 0s 2ms/step - loss: 0.6288 - recall: 0.5810 - val_loss: 0.5917 - val_recall: 0.5276
In [144]:
print("Time taken in seconds ",end-start)
Time taken in seconds  9.132063150405884

Loss Function

In [145]:
#Plotting Train Loss vs Validation Loss
plt.plot(history_3.history['loss'])
plt.plot(history_3.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
No description has been provided for this image

Recall

In [146]:
#Plotting Train recall vs Validation recall
plt.plot(history_3.history['recall'])
plt.plot(history_3.history['val_recall'])
plt.title('model recall')
plt.ylabel('Recall')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
No description has been provided for this image
  • Observations:

  • Both train and validation recall increase consistently showing good generalization.

  • Validation recall is growing gradually meaning the model is learning meaningful patterns.

  • The gap between training and validation is small, indicating good generalization

In [147]:
#Predicting the results using 0.5 as the threshold for training set
y_train_pred = model_3.predict(X_train_smote)
y_train_pred = (y_train_pred > 0.5)
y_train_pred
319/319 [==============================] - 0s 1ms/step
Out[147]:
array([[ True],
       [False],
       [False],
       ...,
       [False],
       [ True],
       [ True]])
In [148]:
#Predicting the results using 0.5 as the threshold for validation set
y_val_pred = model_3.predict(X_val)
y_val_pred = (y_val_pred > 0.5)
y_val_pred
50/50 [==============================] - 0s 1ms/step
Out[148]:
array([[ True],
       [False],
       [False],
       ...,
       [False],
       [ True],
       [False]])
In [149]:
model_name = "NN with SMOTE and SGD"

train_metric_df.loc[model_name] = recall_score(y_train_smote, y_train_pred)
valid_metric_df.loc[model_name] = recall_score(y_val, y_val_pred)

Classification Report

In [150]:
#Classification report on training set
cr = classification_report(y_train_smote, y_train_pred)
print(cr)
              precision    recall  f1-score   support

         0.0       0.65      0.78      0.71      5096
         1.0       0.73      0.59      0.65      5096

    accuracy                           0.68     10192
   macro avg       0.69      0.68      0.68     10192
weighted avg       0.69      0.68      0.68     10192

In [151]:
#classification report on validation set
cr=classification_report(y_val,y_val_pred)
print(cr)
              precision    recall  f1-score   support

         0.0       0.87      0.78      0.82      1274
         1.0       0.38      0.53      0.45       326

    accuracy                           0.73      1600
   macro avg       0.63      0.66      0.63      1600
weighted avg       0.77      0.73      0.75      1600

In [152]:
# Generate a Classification Heatmap
report = classification_report(y_val, y_val_pred, output_dict=True)
df_report = pd.DataFrame(report).transpose()

plt.figure(figsize=(8, 5))
sns.heatmap(df_report.iloc[:-1, :-1], annot=True, cmap="YlGnBu", fmt=".2f")
plt.title("Classification Report Heatmap")
plt.show()
No description has been provided for this image
In [153]:
# Generate ROC Curve to show trade-off between True Positive Rate and False Positive Rate.

y_scores = model_3.predict(X_val).ravel()
fpr, tpr, thresholds = roc_curve(y_val, y_scores)
roc_auc = auc(fpr, tpr)

plt.figure()
plt.plot(fpr, tpr, label=f"AUC = {roc_auc:.2f}")
plt.plot([0, 1], [0, 1], linestyle='--')
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.legend()
plt.grid(True)
plt.show()
50/50 [==============================] - 0s 1ms/step
No description has been provided for this image

Focus on the Business Goal: We're predicting customer churn (1.0 = exited) — so catching as many churners as possible is key.

That makes Recall for class 1.0 (churners) the most important metric to watch.

**This NN model using **Balanced Data with SMOTE and SGD ** is showing good results but not as good as the Adam Optimizer with Dropout.

For the Validation Set:

  • Class 0.0 (Stayed): Precision=0.87, Recall=0.78

  • Class 1.0 (Exited): Precision=0.38, Recall=0.53

  • AUC reflects the model's ability to rank churners above non-churners, and my AUC dropping from 0.76 → 0.73 shows worse ranking ability.

Confusion matrix

In [154]:
#Calculating the confusion matrix on training set
make_confusion_matrix(y_train_smote, y_train_pred)
No description has been provided for this image
In [155]:
#Calculating the confusion matrix on validation set
make_confusion_matrix(y_val,y_val_pred)
No description has been provided for this image

** Observations **

  • With a recall of 53%, this model identifies over half of customers at risk of churn — without excessive false positives. Precision of 38% means predictions are reasonably accurate, helping target real churners more effectively. - Recall (for churners) = TP / (TP + FN) = 172 / (172 + 154) ≈ 0.53
  • We're maintaining strong generalization and avoiding overfitting while maximizing churn recall.

Model 4 - Neural Network with Balanced Data (by applying SMOTE) and Adam Optimizer¶

- Now let's switch to a NN model with Balanced Data (by applying SMOTE) and Adam Optimizer

  • one input layer with 32 neurons
  • two hidden layers with 20 and 14 neurons respectively
  • activation function of ReLU.

Let's apply SMOTE to balance this dataset and then again apply hyperparamter tuning accordingly.

In [156]:
sm = SMOTE(random_state=42)
# Fit SMOTE on the training data and create balanced versions
X_train_smote, y_train_smote = sm.fit_resample(X_train, y_train)

print('After UpSampling, the shape of train_X: {}'.format(X_train_smote.shape))
print('After UpSampling, the shape of train_y: {} \n'.format(y_train_smote.shape))
After UpSampling, the shape of train_X: (10192, 11)
After UpSampling, the shape of train_y: (10192,) 

Let's build a model with the balanced dataset

In [157]:
# clears the current Keras session, resetting all layers and models previously created, freeing up memory and resources.
tf.keras.backend.clear_session()

#Fixing the seed for random number generators so that we can ensure we receive the same output everytime
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
In [158]:
# Initializing the neural network
model_4 = Sequential()

# Input layer
model_4.add(Dense(32, activation="relu", input_dim=X_train_smote.shape[1]))

# First hidden layer
model_4.add(Dense(20, activation="relu"))

# Second hidden layer
model_4.add(Dense(14, activation="relu"))

# Output layer
model_4.add(Dense(1, activation="sigmoid"))
In [159]:
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)    # defining Adam as the optimizer to be used
metric = tf.keras.metrics.Recall()
model_4.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=[metric])
In [160]:
model_4.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 32)                384       
                                                                 
 dense_1 (Dense)             (None, 20)                660       
                                                                 
 dense_2 (Dense)             (None, 14)                294       
                                                                 
 dense_3 (Dense)             (None, 1)                 15        
                                                                 
=================================================================
Total params: 1353 (5.29 KB)
Trainable params: 1353 (5.29 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [161]:
# Fitting the ANN without class_weight since SMOTE is already applied
start = time.time()
history_4 = model_4.fit(X_train_smote, y_train_smote,
                        validation_data=(X_val, y_val),
                        batch_size=batch_size,
                        epochs=epochs)
end = time.time()
Epoch 1/25
160/160 [==============================] - 2s 5ms/step - loss: 0.6041 - recall: 0.6717 - val_loss: 0.5434 - val_recall: 0.7362
Epoch 2/25
160/160 [==============================] - 0s 2ms/step - loss: 0.4916 - recall: 0.7867 - val_loss: 0.5117 - val_recall: 0.7975
Epoch 3/25
160/160 [==============================] - 0s 2ms/step - loss: 0.4592 - recall: 0.7865 - val_loss: 0.4579 - val_recall: 0.7270
Epoch 4/25
160/160 [==============================] - 0s 2ms/step - loss: 0.4452 - recall: 0.7920 - val_loss: 0.4291 - val_recall: 0.6810
Epoch 5/25
160/160 [==============================] - 0s 2ms/step - loss: 0.4355 - recall: 0.7869 - val_loss: 0.4677 - val_recall: 0.7546
Epoch 6/25
160/160 [==============================] - 0s 2ms/step - loss: 0.4285 - recall: 0.7963 - val_loss: 0.4410 - val_recall: 0.6994
Epoch 7/25
160/160 [==============================] - 0s 2ms/step - loss: 0.4232 - recall: 0.8020 - val_loss: 0.4385 - val_recall: 0.6902
Epoch 8/25
160/160 [==============================] - 0s 2ms/step - loss: 0.4163 - recall: 0.8020 - val_loss: 0.4397 - val_recall: 0.6963
Epoch 9/25
160/160 [==============================] - 0s 2ms/step - loss: 0.4123 - recall: 0.8063 - val_loss: 0.4362 - val_recall: 0.7209
Epoch 10/25
160/160 [==============================] - 0s 3ms/step - loss: 0.4072 - recall: 0.8116 - val_loss: 0.4414 - val_recall: 0.7086
Epoch 11/25
160/160 [==============================] - 0s 3ms/step - loss: 0.4043 - recall: 0.8102 - val_loss: 0.4434 - val_recall: 0.7270
Epoch 12/25
160/160 [==============================] - 0s 3ms/step - loss: 0.4000 - recall: 0.8146 - val_loss: 0.4324 - val_recall: 0.6810
Epoch 13/25
160/160 [==============================] - 0s 3ms/step - loss: 0.3965 - recall: 0.8138 - val_loss: 0.4234 - val_recall: 0.6871
Epoch 14/25
160/160 [==============================] - 0s 3ms/step - loss: 0.3918 - recall: 0.8248 - val_loss: 0.4494 - val_recall: 0.7025
Epoch 15/25
160/160 [==============================] - 0s 3ms/step - loss: 0.3870 - recall: 0.8252 - val_loss: 0.4720 - val_recall: 0.7546
Epoch 16/25
160/160 [==============================] - 0s 3ms/step - loss: 0.3827 - recall: 0.8287 - val_loss: 0.4393 - val_recall: 0.6963
Epoch 17/25
160/160 [==============================] - 0s 2ms/step - loss: 0.3807 - recall: 0.8307 - val_loss: 0.4562 - val_recall: 0.7147
Epoch 18/25
160/160 [==============================] - 0s 2ms/step - loss: 0.3770 - recall: 0.8358 - val_loss: 0.4309 - val_recall: 0.6718
Epoch 19/25
160/160 [==============================] - 0s 2ms/step - loss: 0.3746 - recall: 0.8316 - val_loss: 0.4475 - val_recall: 0.6902
Epoch 20/25
160/160 [==============================] - 0s 2ms/step - loss: 0.3708 - recall: 0.8348 - val_loss: 0.4533 - val_recall: 0.6963
Epoch 21/25
160/160 [==============================] - 0s 2ms/step - loss: 0.3692 - recall: 0.8420 - val_loss: 0.4154 - val_recall: 0.6411
Epoch 22/25
160/160 [==============================] - 0s 2ms/step - loss: 0.3683 - recall: 0.8399 - val_loss: 0.4283 - val_recall: 0.6534
Epoch 23/25
160/160 [==============================] - 0s 2ms/step - loss: 0.3630 - recall: 0.8444 - val_loss: 0.4573 - val_recall: 0.6933
Epoch 24/25
160/160 [==============================] - 0s 2ms/step - loss: 0.3626 - recall: 0.8426 - val_loss: 0.4455 - val_recall: 0.6779
Epoch 25/25
160/160 [==============================] - 0s 2ms/step - loss: 0.3589 - recall: 0.8462 - val_loss: 0.4509 - val_recall: 0.6687
In [162]:
print("Time taken in seconds ",end-start)
Time taken in seconds  10.486416816711426

Loss Function

In [163]:
#Plotting Train Loss vs Validation Loss
plt.plot(history_4.history['loss'])
plt.plot(history_4.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
No description has been provided for this image

Recall

In [164]:
#Plotting Train recall vs Validation recall
plt.plot(history_4.history['recall'])
plt.plot(history_4.history['val_recall'])
plt.title('model recall')
plt.ylabel('Recall')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
No description has been provided for this image
  • Observations:

  • Training data continues up to the right, however my Recall for validation data is dropping increasing the gap between them meaning the model may be overfitting.

  • Our model is still catching ~65% of churners on validation even at the end, which is usable — just less stable.

In [165]:
#Predicting the results using 0.5 as the threshold for training set
y_train_pred = model_4.predict(X_train_smote)
y_train_pred = (y_train_pred > 0.5)
y_train_pred
319/319 [==============================] - 0s 1ms/step
Out[165]:
array([[ True],
       [False],
       [False],
       ...,
       [ True],
       [ True],
       [ True]])
In [166]:
#Predicting the results using 0.5 as the threshold for validation set
y_val_pred = model_4.predict(X_val)
y_val_pred = (y_val_pred > 0.5)
y_val_pred
50/50 [==============================] - 0s 1ms/step
Out[166]:
array([[False],
       [False],
       [False],
       ...,
       [False],
       [ True],
       [ True]])
In [167]:
model_name = "NN with SMOTE and Adam"

train_metric_df.loc[model_name] = recall_score(y_train_smote, y_train_pred)
valid_metric_df.loc[model_name] = recall_score(y_val, y_val_pred)

Classification Report

In [168]:
#Classification report on training set
cr = classification_report(y_train_smote, y_train_pred)
print(cr)
              precision    recall  f1-score   support

         0.0       0.85      0.84      0.85      5096
         1.0       0.84      0.85      0.85      5096

    accuracy                           0.85     10192
   macro avg       0.85      0.85      0.85     10192
weighted avg       0.85      0.85      0.85     10192

In [169]:
#classification report on validation set
cr=classification_report(y_val,y_val_pred)
print(cr)
              precision    recall  f1-score   support

         0.0       0.91      0.81      0.86      1274
         1.0       0.48      0.67      0.56       326

    accuracy                           0.78      1600
   macro avg       0.69      0.74      0.71      1600
weighted avg       0.82      0.78      0.80      1600

In [170]:
# Generate a Classification Heatmap
report = classification_report(y_val, y_val_pred, output_dict=True)
df_report = pd.DataFrame(report).transpose()

plt.figure(figsize=(8, 5))
sns.heatmap(df_report.iloc[:-1, :-1], annot=True, cmap="YlGnBu", fmt=".2f")
plt.title("Classification Report Heatmap")
plt.show()
No description has been provided for this image
In [171]:
# Generate ROC Curve to show trade-off between True Positive Rate and False Positive Rate.

y_scores = model_4.predict(X_val).ravel()
fpr, tpr, thresholds = roc_curve(y_val, y_scores)
roc_auc = auc(fpr, tpr)

plt.figure()
plt.plot(fpr, tpr, label=f"AUC = {roc_auc:.2f}")
plt.plot([0, 1], [0, 1], linestyle='--')
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.legend()
plt.grid(True)
plt.show()
50/50 [==============================] - 0s 1ms/step
No description has been provided for this image

Focus on the Business Goal: We're predicting customer churn (1.0 = exited) — so catching as many churners as possible is key.

That makes Recall for class 1.0 (churners) the most important metric to watch.

**This NN model using **Balanced Data with SMOTE and Adam ** is showing improved results from SGD but not as good as the Adam Optimizer with Dropout above.

For the Validation Set:

  • Class 0.0 (Stayed): Precision=0.91, Recall=0.81

  • Class 1.0 (Exited): Precision=0.48, Recall=0.67

  • Recall = 0.67 → We're identifying 67% of actual churners.

  • Model 4 is an improvement in recall, precision, and F1 from Model 3.

  • AUC reflects the model's ability to rank churners above non-churners, and my AUC dropping from 0.73 → 0.83 shows improved ranking ability.

Confusion matrix

In [172]:
#Calculating the confusion matrix on training set
make_confusion_matrix(y_train_smote, y_train_pred)
No description has been provided for this image
In [173]:
#Calculating the confusion matrix on validation set
make_confusion_matrix(y_val,y_val_pred)
No description has been provided for this image

** Observations **

  • With a recall of 67%, this model identifies 2/3rds of customers at risk of churn — without excessive false positives. Precision of 48% means predictions are reasonably accurate, helping target real churners more effectively. - Recall (for churners) = TP / (TP + FN) = 218 / (218 + 108) ≈ 0.67
  • We're maintaining strong generalization and avoiding overfitting while maximizing churn recall.

Model 5 - Neural Network with Balanced Data (by applying SMOTE), Adam Optimizer, and Dropout¶

- Now let's switch to a NN model with Balanced Data (by applying SMOTE) and Adam Optimizer Plus Dropout

  • one input layer with 32 neurons
  • three hidden layers with 20 and 14 and 7 neurons respectively
  • activation function of ReLU.
  • Adam as the optimizer with dropout

Let's apply SMOTE to balance this dataset and then again apply hyperparamter tuning accordingly.

In [174]:
sm = SMOTE(random_state=42)
# Fit SMOTE on the training data and create balanced versions
X_train_smote, y_train_smote = sm.fit_resample(X_train, y_train)

print('After UpSampling, the shape of train_X: {}'.format(X_train_smote.shape))
print('After UpSampling, the shape of train_y: {} \n'.format(y_train_smote.shape))
After UpSampling, the shape of train_X: (10192, 11)
After UpSampling, the shape of train_y: (10192,) 

Let's build a model with the balanced dataset

In [175]:
# clears the current Keras session, resetting all layers and models previously created, freeing up memory and resources.
tf.keras.backend.clear_session()

#Fixing the seed for random number generators so that we can ensure we receive the same output everytime
np.random.seed(2)
random.seed(2)
tf.random.set_seed(2)
In [176]:
# Initializing the neural network
model_5 = Sequential()

# Input layer + Dropout
model_5.add(Dense(32, activation="relu", input_dim=X_train_smote.shape[1]))
model_5.add(Dropout(0.3))

# First hidden layer + Dropout
model_5.add(Dense(20, activation="relu"))
model_5.add(Dropout(0.2))

# Second hidden layer + Dropout
model_5.add(Dense(14, activation="relu"))
model_5.add(Dropout(0.1))

# Third hidden layer (no Dropout here is fine too)
model_5.add(Dense(7, activation="relu"))

# Output layer
model_5.add(Dense(1, activation="sigmoid"))
In [177]:
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)    # defining Adam as the optimizer to be used
metric = tf.keras.metrics.Recall()
model_5.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=[metric])
In [178]:
model_5.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 32)                384       
                                                                 
 dropout (Dropout)           (None, 32)                0         
                                                                 
 dense_1 (Dense)             (None, 20)                660       
                                                                 
 dropout_1 (Dropout)         (None, 20)                0         
                                                                 
 dense_2 (Dense)             (None, 14)                294       
                                                                 
 dropout_2 (Dropout)         (None, 14)                0         
                                                                 
 dense_3 (Dense)             (None, 7)                 105       
                                                                 
 dense_4 (Dense)             (None, 1)                 8         
                                                                 
=================================================================
Total params: 1451 (5.67 KB)
Trainable params: 1451 (5.67 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [179]:
# Fitting the ANN without class_weight since SMOTE is already applied
start = time.time()
history_5 = model_5.fit(X_train_smote, y_train_smote,
                        validation_data=(X_val, y_val),
                        batch_size=batch_size,
                        epochs=epochs)
end = time.time()
Epoch 1/25
160/160 [==============================] - 1s 3ms/step - loss: 0.6796 - recall: 0.6529 - val_loss: 0.6683 - val_recall: 0.7423
Epoch 2/25
160/160 [==============================] - 0s 2ms/step - loss: 0.5887 - recall: 0.6664 - val_loss: 0.5286 - val_recall: 0.7270
Epoch 3/25
160/160 [==============================] - 0s 2ms/step - loss: 0.5457 - recall: 0.7106 - val_loss: 0.4740 - val_recall: 0.7055
Epoch 4/25
160/160 [==============================] - 0s 2ms/step - loss: 0.5260 - recall: 0.7290 - val_loss: 0.4863 - val_recall: 0.7454
Epoch 5/25
160/160 [==============================] - 0s 2ms/step - loss: 0.5128 - recall: 0.7433 - val_loss: 0.4728 - val_recall: 0.7362
Epoch 6/25
160/160 [==============================] - 0s 2ms/step - loss: 0.5063 - recall: 0.7518 - val_loss: 0.4558 - val_recall: 0.7209
Epoch 7/25
160/160 [==============================] - 0s 2ms/step - loss: 0.4998 - recall: 0.7677 - val_loss: 0.4652 - val_recall: 0.7362
Epoch 8/25
160/160 [==============================] - 0s 2ms/step - loss: 0.4898 - recall: 0.7677 - val_loss: 0.4495 - val_recall: 0.7301
Epoch 9/25
160/160 [==============================] - 0s 2ms/step - loss: 0.4904 - recall: 0.7639 - val_loss: 0.4785 - val_recall: 0.7577
Epoch 10/25
160/160 [==============================] - 0s 2ms/step - loss: 0.4837 - recall: 0.7708 - val_loss: 0.4505 - val_recall: 0.7423
Epoch 11/25
160/160 [==============================] - 0s 2ms/step - loss: 0.4787 - recall: 0.7688 - val_loss: 0.4602 - val_recall: 0.7485
Epoch 12/25
160/160 [==============================] - 0s 2ms/step - loss: 0.4772 - recall: 0.7753 - val_loss: 0.4378 - val_recall: 0.7086
Epoch 13/25
160/160 [==============================] - 0s 2ms/step - loss: 0.4744 - recall: 0.7749 - val_loss: 0.4435 - val_recall: 0.7178
Epoch 14/25
160/160 [==============================] - 0s 2ms/step - loss: 0.4686 - recall: 0.7773 - val_loss: 0.4521 - val_recall: 0.7270
Epoch 15/25
160/160 [==============================] - 0s 2ms/step - loss: 0.4724 - recall: 0.7684 - val_loss: 0.4466 - val_recall: 0.7147
Epoch 16/25
160/160 [==============================] - 0s 3ms/step - loss: 0.4660 - recall: 0.7786 - val_loss: 0.4393 - val_recall: 0.7178
Epoch 17/25
160/160 [==============================] - 1s 3ms/step - loss: 0.4668 - recall: 0.7765 - val_loss: 0.4465 - val_recall: 0.7117
Epoch 18/25
160/160 [==============================] - 0s 3ms/step - loss: 0.4629 - recall: 0.7875 - val_loss: 0.4296 - val_recall: 0.7025
Epoch 19/25
160/160 [==============================] - 1s 3ms/step - loss: 0.4597 - recall: 0.7947 - val_loss: 0.4293 - val_recall: 0.7209
Epoch 20/25
160/160 [==============================] - 1s 3ms/step - loss: 0.4587 - recall: 0.7879 - val_loss: 0.4450 - val_recall: 0.7393
Epoch 21/25
160/160 [==============================] - 0s 3ms/step - loss: 0.4554 - recall: 0.7867 - val_loss: 0.4346 - val_recall: 0.7209
Epoch 22/25
160/160 [==============================] - 1s 3ms/step - loss: 0.4569 - recall: 0.7834 - val_loss: 0.4329 - val_recall: 0.7362
Epoch 23/25
160/160 [==============================] - 0s 2ms/step - loss: 0.4573 - recall: 0.7898 - val_loss: 0.4498 - val_recall: 0.7577
Epoch 24/25
160/160 [==============================] - 0s 2ms/step - loss: 0.4565 - recall: 0.7947 - val_loss: 0.4267 - val_recall: 0.7055
Epoch 25/25
160/160 [==============================] - 0s 2ms/step - loss: 0.4526 - recall: 0.7955 - val_loss: 0.4472 - val_recall: 0.7454
In [180]:
print("Time taken in seconds ",end-start)
Time taken in seconds  11.28977108001709

Loss Function

In [181]:
#Plotting Train Loss vs Validation Loss
plt.plot(history_5.history['loss'])
plt.plot(history_5.history['val_loss'])
plt.title('model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
No description has been provided for this image

Recall

In [182]:
#Plotting Train recall vs Validation recall
plt.plot(history_5.history['recall'])
plt.plot(history_5.history['val_recall'])
plt.title('model recall')
plt.ylabel('Recall')
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()
No description has been provided for this image
  • Observations:

  • Training data continues up to the right, and my Validation Recall is remaining fairly stable around 75%.

  • Our model is catching ~75% of churners on validation.

  • Dropout seemed to help generalization and reduced overfitting from Model 4.

In [183]:
#Predicting the results using 0.5 as the threshold for training set
y_train_pred = model_5.predict(X_train_smote)
y_train_pred = (y_train_pred > 0.5)
y_train_pred
319/319 [==============================] - 1s 2ms/step
Out[183]:
array([[ True],
       [False],
       [False],
       ...,
       [ True],
       [ True],
       [ True]])
In [184]:
#Predicting the results using 0.5 as the threshold for validation set
y_val_pred = model_5.predict(X_val)
y_val_pred = (y_val_pred > 0.5)
y_val_pred
50/50 [==============================] - 0s 1ms/step
Out[184]:
array([[False],
       [False],
       [False],
       ...,
       [False],
       [ True],
       [ True]])
In [185]:
model_name = "NN with SMOTE and Adam Plus Dropout"

train_metric_df.loc[model_name] = recall_score(y_train_smote, y_train_pred)
valid_metric_df.loc[model_name] = recall_score(y_val, y_val_pred)

Classification Report

In [186]:
#Classification report on training set
cr = classification_report(y_train_smote, y_train_pred)
print(cr)
              precision    recall  f1-score   support

         0.0       0.81      0.81      0.81      5096
         1.0       0.81      0.81      0.81      5096

    accuracy                           0.81     10192
   macro avg       0.81      0.81      0.81     10192
weighted avg       0.81      0.81      0.81     10192

In [187]:
#classification report on validation set
cr=classification_report(y_val,y_val_pred)
print(cr)
              precision    recall  f1-score   support

         0.0       0.92      0.80      0.86      1274
         1.0       0.49      0.75      0.59       326

    accuracy                           0.79      1600
   macro avg       0.71      0.77      0.72      1600
weighted avg       0.84      0.79      0.80      1600

In [188]:
# Generate a Classification Heatmap
report = classification_report(y_val, y_val_pred, output_dict=True)
df_report = pd.DataFrame(report).transpose()

plt.figure(figsize=(8, 5))
sns.heatmap(df_report.iloc[:-1, :-1], annot=True, cmap="YlGnBu", fmt=".2f")
plt.title("Classification Report Heatmap")
plt.show()
No description has been provided for this image
In [189]:
# Generate ROC Curve to show trade-off between True Positive Rate and False Positive Rate.

y_scores = model_5.predict(X_val).ravel()
fpr, tpr, thresholds = roc_curve(y_val, y_scores)
roc_auc = auc(fpr, tpr)

plt.figure()
plt.plot(fpr, tpr, label=f"AUC = {roc_auc:.2f}")
plt.plot([0, 1], [0, 1], linestyle='--')
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.legend()
plt.grid(True)
plt.show()
50/50 [==============================] - 0s 1ms/step
No description has been provided for this image

Focus on the Business Goal: We're predicting customer churn (1.0 = exited) — so catching as many churners as possible is key.

That makes Recall for class 1.0 (churners) the most important metric to watch.

This NN model using **Balanced Data with SMOTE and Adam Plus Dropout is showing improved results from Adam without Dropout.

For the Validation Set:

  • Class 0.0 (Stayed): Precision=0.92, Recall=0.80

  • Class 1.0 (Exited): Precision=0.49, Recall=0.75

  • Recall = 0.75 → We're identifying 75% or 3 out of 4 of actual churners.

  • Model 5 is an improvement in recall, precision, and F1 from Model 4 and shows better improvement.

  • AUC reflects the model's ability to rank churners above non-churners, and my AUC dropping from 0.83 → 0.86 shows improved ranking ability as well.

Confusion matrix

In [190]:
#Calculating the confusion matrix on training set
make_confusion_matrix(y_train_smote, y_train_pred)
No description has been provided for this image
In [191]:
#Calculating the confusion matrix on validation set
make_confusion_matrix(y_val,y_val_pred)
No description has been provided for this image

** Observations **

  • With a recall of 75%, this model identifies 3 out of 4 of customers at risk of churn — without excessive false positives. Precision of 49% means predictions are reasonably accurate, helping target real churners more effectively. - Recall (for churners) = TP / (TP + FN) = 243 / (218 + 108) ≈ 0.75
  • We're maintaining strong generalization and avoiding overfitting while maximizing churn recall.
  • Model 5 is the most balanced. It identifies 75% of churners while keeping false alerts relatively low, with nearly 50% precision. Overall accuracy is high at 79%, and the model generalizes well to unseen data.
  • Model 3 with Adam Optimizer, Class Weights, and Dropout had the highest Recall Score of 80%.

Model Performance Comparison and Final Model Selection¶

In [192]:
print("Training performance comparison")
train_metric_df
Training performance comparison
Out[192]:
recall
NN with SGD 0.480828
NN with Adam 0.773773
NN with Adam and Dropout 0.804448
NN with SMOTE and SGD 0.586146
NN with SMOTE and Adam 0.853218
NN with SMOTE and Adam Plus Dropout 0.806319
In [193]:
print("Validation set performance comparison")
valid_metric_df
Validation set performance comparison
Out[193]:
recall
NN with SGD 0.487730
NN with Adam 0.757669
NN with Adam and Dropout 0.803681
NN with SMOTE and SGD 0.527607
NN with SMOTE and Adam 0.668712
NN with SMOTE and Adam Plus Dropout 0.745399
In [194]:
# Create a Barplot of the above Validation Data performance.
# Define model comparison data
data = {
    "Model": [
        "Model 0: NN using SGD + Class Weights",
        "Model 1: NN using Adam + Class Weights",
        "Model 2: NN using Adam + Class Weights + Dropout",
        "Model 3: NN using Balanced Data SMOTE + SGD",
        "Model 4: NN using Balanced Data SMOTE + Adam",
        "Model 5: NN using Balanced Data SMOTE + Adam + Dropout"
    ],
    "Recall (Churn)": [0.49, 0.76, 0.80, 0.53, 0.67, 0.75]
}

# Create DataFrame
df_models = pd.DataFrame(data)

# Set seaborn style
sns.set(style="whitegrid")

# Identify the index of the best model
best_model_idx = df_models["Recall (Churn)"].idxmax()

# Create the barplot
plt.figure(figsize=(10, 6))
barplot = sns.barplot(x="Recall (Churn)", y="Model", data=df_models, palette="Blues_d")

# Highlight the best model in orange
barplot.patches[best_model_idx].set_color('orange')

# Add text labels to bars
for i, p in enumerate(barplot.patches):
    width = p.get_width()
    plt.text(width + 0.01, p.get_y() + p.get_height() / 2,
             f'{width:.2f}', va='center')

# Final plot adjustments
plt.title("Recall (Churn) by Model", fontsize=14)
plt.xlabel("Recall (Churn)")
plt.ylabel("Model")
plt.tight_layout()
plt.show()
No description has been provided for this image
In [202]:
# Final model metrics
data = {
    "Model": [
        "M0: SGD + CW",
        "M1: Adam + CW",
        "M2: Adam + CW + Dropout",
        "M3: SMOTE + SGD",
        "M4: SMOTE + Adam",
        "M5: SMOTE + Adam + Dropout"
    ],
    "Recall": [0.49, 0.76, 0.80, 0.53, 0.67, 0.75 ],
    "Precision": [0.22, 0.46, 0.44, 0.38, 0.48, 0.49 ],
    "F1 Score": [0.30, 0.57, 0.57, 0.45, 0.56, 0.59 ],
    "AUC": [0.52, 0.85, 0.84, 0.73, 0.83, 0.86 ]
}

# Create DataFrame
df_metrics = pd.DataFrame(data)
df_metrics.set_index("Model", inplace=True)

# Plot grouped bar chart
ax = df_metrics.plot(kind="bar", figsize=(12, 6), colormap="Set2", edgecolor="black")

# Styling
plt.title("Final Model Comparison Across Metrics", fontsize=14)
plt.ylabel("Score")
plt.ylim(0, 1)
plt.grid(axis='y')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.legend(title="Metric")
plt.show()
No description has been provided for this image

Observations:

  • Model 2: NN using Adam + Class Weights + Dropout had the best Recall value of 80%.
In [195]:
train_metric_df - valid_metric_df
Out[195]:
recall
NN with SGD -0.006902
NN with Adam 0.016104
NN with Adam and Dropout 0.000767
NN with SMOTE and SGD 0.058539
NN with SMOTE and Adam 0.184507
NN with SMOTE and Adam Plus Dropout 0.060920

Observations:

  • Model 2: NN using Adam + Class Weights + Dropout had the smallest gap between training and validation sets of 0.000767.
  • This model showed the best Generalization.
In [196]:
y_test_pred = model_2.predict(X_test)
y_test_pred = (y_test_pred > 0.5)
print(y_test_pred)
63/63 [==============================] - 0s 2ms/step
[[False]
 [False]
 [False]
 ...
 [ True]
 [False]
 [False]]
In [197]:
#lets print classification report
cr=classification_report(y_test,y_test_pred)
print(cr)
              precision    recall  f1-score   support

         0.0       0.93      0.73      0.82      1593
         1.0       0.43      0.80      0.56       407

    accuracy                           0.74      2000
   macro avg       0.68      0.76      0.69      2000
weighted avg       0.83      0.74      0.77      2000

Observations:

  • Recall is remaining steady at 80% on the Test Set.
  • Precision is also remaining steady at 43% for the Churn Class.
  • The model seems to be generalizing well
In [198]:
#Calculating the confusion matrix
make_confusion_matrix(y_test,y_test_pred)
No description has been provided for this image
In [199]:
# Get predicted probabilities for the positive class (churn)
y_test_pred_proba = model_2.predict(X_test).ravel()

# Compute FPR, TPR, and thresholds
fpr, tpr, thresholds = roc_curve(y_test, y_test_pred_proba)
roc_auc = auc(fpr, tpr)

# Plot the ROC Curve
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC Curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='navy', linestyle='--', label='Random Guessing')

plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate (Recall)')
plt.title('ROC Curve - Test Set')
plt.legend(loc='lower right')
plt.grid(True)
plt.tight_layout()
plt.show()
63/63 [==============================] - 0s 1ms/step
No description has been provided for this image
In [201]:
precision, recall, thresholds = precision_recall_curve(y_test, y_test_pred_proba)

plt.figure(figsize=(8, 6))
plt.plot(recall, precision, color="teal", lw=2)
plt.xlabel("Recall")
plt.ylabel("Precision")
plt.title("Precision-Recall Curve - Test Set")
plt.grid(True)
plt.tight_layout()
plt.show()
No description has been provided for this image

Observations

  • With a recall of 80%, this model identifies more than 3 out of 4 customers at risk of churn — without excessive false positives. Precision of 43% meaning predictions are reasonably accurate, helping target real churners more effectively. - Recall (for churners) = TP / (TP + FN) = 325 / (325 + 80) ≈ 0.80
  • We're maintaining strong generalization and avoiding overfitting while maximizing churn recall.
  • The ROC Curve shows an AUC of .85.
  • The ROC curve confirms that our model can consistently rank churners above non-churners with 85% confidence. This supports the model’s utility in prioritizing outreach efforts based on predicted risk.
  • The Precision-Recall Curve shows the expected inverse relationship.
    • As recall increases, precision drops.
    • We can capture more churners, but at the cost of more false positives
  • We can choose a threshold that balances recall vs. precision depending on our business objective.
    • For example: if we want ~70% recall, this curve shows we'll get ~60% precision.
    • At ~80% recall, precision drops to ~43%, which aligns with our reported score.

Actionable Insights and Business Recommendations¶

Observations

  • This model catches 80% of churners and flags them with 43% precision, making it an effective churn prediction engine. Despite some false positives, the model ensures the bank can intervene early on 8 out of 10 customers likely to leave, significantly reducing potential revenue loss.

Key Business Insights from Churn Prediction Analysis¶

  • The final model (NN with Adam + Dropout) achieved 80% recall on churners and an AUC of 0.83, indicating strong separation between churners and loyal customers.

  • This means 8 out of 10 customers who will churn can be flagged early — a major opportunity for targeted retention efforts.

False Alarms Are Reasonable for a Recall-Focused Strategy¶

  • While precision is ~44%, it may be acceptable in scenarios like churn prevention where missing a churner is more costly than wrongly predicting churn.

  • The model favors catching at-risk customers, even if some loyal customers are mistakenly flagged.

Recommendations for the Business¶

  1. Deploy the Model for Retention Campaign Targeting
  2. Use the model to score customers weekly/monthly
  3. Prioritize the top 20–30% highest churn risk scores for retention outreach
  4. Segment high-risk churners into A/B groups to test
  5. Create personalized retention offers to the high risk churn group.
  6. Integrate Model Into CRM or Customer Analytics
  7. Alert customer success teams when a customer crosses a churn risk threshold
  8. Retrain the model every 3–6 months to reflect changing customer behavior and campaign effectiveness

Early Detection.png

In [211]:
# Converting notebook to .html format for upload:

# Step 1: Copy the notebook locally
!cp '/content/drive/MyDrive/Colab_Notebooks/mod4_proj4_Full_code_Thomas_Hall.ipynb' "/content/"

# Step 2: Convert to HTML
!jupyter nbconvert --to html '/content/mod4_proj4_Full_code_Thomas_Hall.ipynb'