← All projects
← Back to all projects
Retail Analytics · 2025
Customer Segmentation & Churn Prediction
Finding the 28% of customers who drive 83% of revenue
Apache SparkSpark MLlibMLflowDatabricksUnity CatalogPython
Problem
An online retailer needed to know which customers were worth real retention spend. Treating everyone the same wastes budget. Going by gut misses revenue. The job was to segment by value and behavior, then flag the customers most likely to drop off.
Approach
- Cleaned 1.07M raw transactions (UCI Online Retail II) down to 779,425 usable rows across 5,878 customers.
- Engineered 13 customer-level features blending RFM (Recency, Frequency, Monetary) with behavioral and temporal signals.
- Ran K-Means with silhouette analysis. Settled on 3 segments at silhouette 0.5748.
- Built a churn classifier on top of the segments to flag at-risk customers each week.
Insights
- 1,700 'High-Value Loyal' customers (28.9%) drove 82.8% of revenue at an average spend of £8,467.
- 2,342 'Growth / Mid-Tier' customers were the highest conversion opportunity. Most retention budget should target this group.
- 1,836 'At-Risk / Dormant' customers contributed only 5.9% of revenue. Win-back ROI on this group needed careful scoping.
Impact
Gives the marketing team a weekly prioritized list. Churn AUC-ROC of 0.7716 on the holdout. The bigger win was concentrating retention spend on the top segment instead of spreading it evenly across the whole base.
← Back to all projects