Customer Segmentation Analysis through RFM & Machine Learning

Executive Summary

  • Industry: E-commerce Retail
  • Business Problem: Rising customer acquisition costs and undifferentiated marketing spend limited retention, personalization, and customer lifetime value (CLV) growth.
  • Objective: Identify distinct customer segments based on purchasing behavior to enable targeted marketing, retention, and growth strategies.
  • Data and Methods: 12 months of transaction data (UCI Online Retail II dataset) analyzed using RFM feature engineering and K-Means clustering, supported by exploratory analysis in Python.
  • Key Insights: A strong Pareto effect emerged: A small high-value customer base generating disproportionately high share of revenue, while over one-third of customers were inactive or at risk of churn.
  • Business Impact: Delivered a segmentation framework that supports personalized marketing, improved CLV, and more efficient allocation of marketing resources.

Business Problem

NovaCart operates in a highly competitive UK e-commerce environment where customer acquisition costs continue to rise and customer loyalty is increasingly fragile. Despite having rich transactional data, marketing efforts largely treated customers uniformly, resulting in:

  • Diluted marketing spend across high- and low-value customers
  • Limited personalization across customer journey
  • Inconsistent re-engagement of churn or inactive buyers

Marketing lead requires a data-driven approach to understand their customers and where marketing investment would generate the highest return.

Objectives

The customer segmentation analysis was designed as a decision support framework for the marketing executives to identify:

  • Which customer segments contribute the most to revenue and long-term value?
  • Which customers are at risk of churn and should be prioritized for reactivation?
  • How should marketing and customer journey strategies differ by segment?
  • How can segmentation be embedded into CRM and campaign workflows?

Analytical Approach

  • Data cleaning and validation to remove cancellations, returns, and invalid records.
  • Customer-level aggregation and RFM feature engineering.
  • Standardization of features to ensure fair distance-based clustering.
  • Exploratory Data Analysis (EDA) on seasonality, geography, and acquisition channels.
  • K-Means clustering with Elbow and Silhouette methods to determine optimal segmentation.

The approach prioritized interpretability and business usability, ensuring that segments could be understood and operationalized by sales and marketing stakeholders.

Key Insights

  1. Strong Revenue Concentration: A small group of high-frequency, high-spend customers generated a disproportionate share of total revenue, confirming a clear Pareto effect.
  2. Large At-Risk Segment: Over one-third of customers showed long recency gaps, low frequency, and low spend, representing significant dormant revenue potential if reactivated.
  3. Clear Journey Progression: Customers followed predictable behavioral paths from new/low-value to potential loyalists, with identifiable intervention points to accelerate progression.
  4. Channel-Driven Value Differences: Referral and email acquisition channels produced higher-value customers, while marketplaces performed better as discovery channels than retention drivers.

Customer Segmentation Insights

Using K-Means clustering (K=4), four distinct customer segments emerged:

  1. Loyal Champions (18%)
    • Recent purchasers with high frequency and high spend.
    • Disproportionate high contributors to total revenue.
    • Often loyalty program members or referral-driven
  2. Potential Loyalists (27%)
    • Moderately recent, mid-frequency customers.
    • Strong candidates for upsell, cross-sell, and loyalty nurturing
  3. New / Low-Value Customers (19%)
    • Recently acquired or infrequent buyers.
    • Primary opportunity lies in driving second and third purchases
  4. At-Risk / Churned (36%)
    • Long periods of inactivity with low current value
    • Largest segment by size but underperforming in revenue

This segmentation provided a clear, actionable framework for personalized marketing strategies.

Recommendations and Decision Support

Retention & Growth:

  • Prioritize Loyal Champions with VIP treatment, exclusive offers, and referral incentives.
  • Nurture Potential Loyalists through personalized cross-sell, upsell, and loyalty programs.

Reactivation & Nurturing:

  • Launch automated win-back campaigns for At-Risk customers triggered by recency thresholds.
  • Deploy onboarding and activation journeys for New/Low-Value customers to accelerate repeat purchases.

Operationalization:

  • Embed segment labels into CRM and marketing automation platforms.
  • Align paid media and social targeting with high-value lookalike profiles.
  • Use marketplaces as acquisition channels while migrating repeat buyers to e-commerce website and app.

These actions enable NovaCart to shift from broad, undifferentiated marketing to segment-led growth and retention strategies.

Business Impact

Based on industry benchmarks, the segmentation framework demonstrated potential to:

  • Improve marketing ROI by reallocating spend toward high-value segments.
  • Increase Customer Lifetime Value (CLV) through targeted retention and upsell.
  • Convert churn risk customers into active revenue contributors.
  • Improve executive visibility into customer behavior and journey effectiveness.

This case demonstrates how machine learning-based RFM customer segmentation can be applied to improve retention, personalization, and revenue efficiency, transforming raw transaction data into a scalable decision support  framework for marketing executives.