Bank Analytics
Introduction
During my project, I had the thrilling opportunity to explore a bank dataset and uncover the patterns behind loan defaulters. It felt like being given a puzzle, and I was eager to solve it.
This post offers a glimpse into my project, where I’ll discuss the challenges and successes of analyzing data to predict loan defaults. Let’s dive into the realm of numbers together and uncover the narratives hidden within the data. Join me as I simplify the complexities, sharing the lessons learned and insights gained throughout the process.
Dataset Summary
The dataset for this project includes personal details, professional backgrounds, and credit information of the bank’s customers. It comprises two files: Finance_1 and Finance_2. Each File containing 39,000+ records and sharing a common column ID.
Project Goal
The primary objective was to transform key performance indicators (KPIs) into intuitive visualizations. The aim was to create clear and comprehensible charts and graphs that addressed the specified KPIs, making it easy for everyone to understand the company’s performance. This goal challenged me to refine my data analysis skills and effectively communicate insights visually, thereby facilitating better-informed decision-making based on the data.
KPIs
- Year wise loan amount Stats
- Grade and sub grade wise revol_bal
- Total Payment for Verified Status Vs Total Payment for Non-Verified Status
- State wise and month wise loan status
- Home ownership Vs last payment date stats
Data Preparation
In preparing our dataset for analysis, performed process of cleaning and transforming raw data prior to processing and analysis, the initial and crucial step involved merging the two datasets. This was followed by meticulous data cleaning to ensure the integrity and relevance of the information. Here are the key actions taken:
1. Merging Datasets: Combined Finance_1 and Finance_2 using the common ID column.
2. Handling Missing (Empty or Null) Values: Identified and addressed any missing data to prevent biases. In cases where the proportion of missing data was high and imputation was not feasible, removed those columns or records to maintain data quality.
Such as:
i. Null value settled: “emp_length” (here n/a to NA)
ii. Null value: “revol_util” (FILLED NULL values to 0), “last_pymnt_d” (handled with Date: 4/1/2013 using median formula, “next_pymnt_d” (filled NULL values to NA), “last_credit_pull_d” handled with Date: 3/1/2015 using median formula
3. Removed Duplicate or Irrelevant Column: Removed any columns that were exact duplicates of others to avoid redundancy. Eliminated columns that did not contribute valuable information to the analysis or were outside the scope of the project goals.
Such as: “emp_title”, “Pymnt_plan”, “desc”, “purpose” and “title”
4. Removing Redundant Columns: Removed columns that were deemed redundant or provided no additional value to the analysis.
Such as: “Pymnt_plan”
5. Data Consistency: Ensured consistency in data formats and units across the dataset.
These data cleaning steps were essential in making sure the dataset was devoid of redundancies, irrelevant data, and missing values. This meticulous preparation paved the way for a more precise and meaningful analysis of the selected key performance indicators.
Dashboard
KPI 1:
Year wise loan amount Stats
Analyzing Loan Amounts Over Time
The first key performance indicator (KPI) focuses on examining the relationship between the loan amounts taken by customers and the issuance dates (issue_d). This KPI compares the loan amounts taken by customers year over year. To visualize the trends in loan amounts — whether they have increased or decreased — a line chart is used.
KPI 2:
Year wise loan amount Stats
Analyzing Revolving Balance by Grade and Sub-Grade
The second key performance indicator (KPI) aims to understand the relationship between grade, sub-grade, and revolving balance (revol_bal). This KPI highlights the total revolving balance contributed by each sub-grade and grade. To facilitate easy interpretation, a matrix table is used, with data bars indicating the revolving balance corresponding to each grade and sub-grade.
KPI 3:
Total Payment for Verified Status Vs Total Payment for Non-Verified Status
Contribution of Verified vs. Non-Verified Customers to Total Payments
The third key performance indicator (KPI) examines the relationship between the loan amounts taken by customers and the issuance dates (issue_d). This KPI specifically focuses on comparing the contributions of verified versus non-verified customers to the total payments. A donut chart is used to effectively display the proportions of total payments made by verified and non-verified customers.
KPI 4:
State wise and month wise loan status
State-Wise Loan Status
A stacked column chart is utilized to represent the loan status across different states. This visualization helps in comparing the distribution of various loan statuses for each state.
Loan Status by Last Credit Pull Date
To illustrate the loan status based on the last credit pull date (last_credit_pull_d), a line chart is used. This chart shows how the loan status has changed over time, providing insights into trends and patterns related to credit pulls.
Filter Used — Month and Loan Status
KPI 5:
Home ownership Vs last payment date stats
Home Ownership and Last Payment Date
The fifth key performance indicator (KPI) explores the relationship between the home ownership status of customers and their last payment date. This KPI compares the last payment dates of customers across different home ownership statuses, providing insights into payment behaviors in relation to home ownership.
Filter Used — Year, Quarter, Month
Additional Key Performance Indicators
- Sum of Loan Amount: Calculated the total loan amount taken by customers.
2. Count of Issuance Dates (issue_d): Counted the number of loan issuance dates to understand the frequency of loans over time.
3. Sum of Funded Amount: Summed the total funded amount to gauge the overall financial support provided.
4. Average Interest Rate (int_rate): Computed the average interest rate to assess the cost of borrowing for customers.
5. Average Funded Amount: Determined the average funded amount per loan to analyze the typical loan size.
Key Findings
1. Yearly Loan Amount Trends: There is a clear and gradual increase in the loan amount taken by customers from one year to the next.
2. Revolving Balance by Grade: Customers with grade B have the highest revolving balance (revol_bal), while those with grade G have the lowest.
3. Payments by Verification Status: Verified accounts, which make up 41.12% of the total, have contributed $153.54 million in total payments. Non-verified accounts, representing 58.88%, have contributed $219.89 million, indicating a clear distinction in total payments based on verification status.
4. Loan Status Distribution: “Fully Paid” has the highest count of loan statuses at 32,950, followed by “Charged Off” at 5,627, and “Current” at 1,140.
5. Home Ownership and Last Payment: Across all five home ownership categories, the count of last payments ranges from 3 to 18,899. “Rent” accounts for 47.54% of last payments, while “None” has the least count of last payments.
Conclusion
1. Yearly Loan Amount Trends:
The consistent and gradual increase in the loan amount taken by customers each year indicates growing demand for loans. This trend could be driven by increasing financial needs or confidence in the lending institution. Continuous growth in loan amounts suggests the need for robust financial planning and risk management to accommodate this increasing demand.
2. Revolving Balance by Grade:
Customers with grade B having the highest revolving balance, while those with grade G having the lowest, highlights the variance in credit usage across different credit grades. This suggests that grade B customers are more reliant on revolving credit, which may warrant closer monitoring and potentially targeted financial management assistance. Conversely, grade G customers might either have lower credit limits or be more conservative in their credit usage.
3. Payments by Verification Status:
Verified accounts contribute significantly to total payments, but non-verified accounts, despite representing a larger percentage of the total, contribute even more. This distinction highlights the importance of verification in understanding and managing payment behaviors. Encouraging more customers to verify their accounts could improve payment reliability and reduce risk.
4. Loan Status Distribution:
The high number of “Fully Paid” loans is a positive indicator of customer repayment behavior. However, the significant number of “Charged Off” loans points to a need for improved credit risk assessment and management strategies. The relatively low number of “Current” loans suggests fewer ongoing repayments, which could affect future cash flows.
5. Home Ownership and Last Payment:
The high proportion of last payments from renters indicates that renters might have a higher dependency on loans compared to homeowners. This could reflect underlying financial instability or limited access to other forms of credit. Tailoring loan products to better suit the needs of renters, while also managing risk, could improve overall financial health and repayment rates. The minimal last payments from those categorized as “None” suggests they are a smaller or more financially stable group within the customer base.
Overall Conclusion
The analysis highlights several key areas for strategic focus:
Loan Demand Management: The increasing loan amounts necessitate robust risk management and resource allocation strategies to sustain growth.
Credit Utilization: Variations in revolving balances by grade indicate different credit behaviors, necessitating tailored financial products and support for higher-risk groups.
Verification and Risk Mitigation: Encouraging account verification could enhance payment reliability and reduce financial risk.
Credit Risk Management: Addressing the high number of charged-off loans through improved credit assessments and support mechanisms can mitigate potential losses.
Tailored Loan Products: Customizing loan products to address the needs of renters and other specific groups can enhance financial stability and repayment performance.