← Back to portfolio

Online Retail II — Customer Value & Revenue

Who are the most valuable customers, and what really drives revenue for a UK online gift retailer? An end-to-end analysis of 1,067,371 transactions — cleaned, segmented (RFM), and turned into retention and revenue recommendations.

Tools: SQL (DuckDB) · Python (pandas, matplotlib) Data: UCI Online Retail II Period: Dec 2009 – Dec 2011 Links: Kaggle notebook · GitHub
1 · Introduction (Ask)

The business question

A UK-registered, non-store online retailer of all-occasion giftware wants to grow revenue and keep customers — but doesn't know who its valuable customers are or what drives sales. My job: turn ~1.07M raw transactions into clear answers marketing and operations can act on.

Assumptions: negative-quantity rows are returns/cancellations (not sales); rows priced ≤ £0 are samples/adjustments; postage & manual entries are non-product revenue, reported separately; sales without a Customer ID still count toward revenue but are excluded from customer-level analysis.
Headline numbers (after cleaning)

What the cleaned data shows

£19.6M
Product revenue
£18.2M
Net revenue (after returns)
−£1.46M
Returns (7.4% of revenue)
5,852
Identified customers
39,517
Orders
£497
Avg order value
72.4%
Repeat-customer rate
4,901
Products sold

A subtle data trap handled in cleaning: the source ships two yearly sheets that overlap in Dec 2010 — naïvely merging them double-counts ~£0.5M. De-duplicating on the business columns fixes it. Full cleaning log →

2 · Problems → 3 · Analyze & Share

Six questions, answered with evidence

Each chart answers one business question and states the fact it supports. Hover any chart for detail.

Revenue over time

Strong, repeatable Q4 seasonality — September to November is the money-making window every year.

Top products by revenue

A handful of homeware/gift lines drive a large share of sales.

Revenue by country

The UK is ~85% of revenue; some export markets are just 1–3 wholesale accounts (concentration risk).

Customer value — RFM segments

Champions are ~25% of customers but 69% of revenue. Loyalty, not breadth, is the engine.

Revenue leakage — returns

~7.4% of product revenue is lost to returns/cancellations — a real, addressable leak.

Retention — cohort analysis

% of each first-purchase cohort still buying in later months. First-month retention is modest, then a loyal core keeps re-purchasing.
4 · Solutions

Options & trade-offs

Revenue is concentrated in a small group of repeat customers, peaks hard in Q4, and loses ~7% to returns. Three options:

Recommended

A · Retention & win-back program

Champion perks + automated win-back for the "At risk (was loyal)" segment (£1.0M at stake).

  • + Targets the segment already driving most revenue
  • + Cheap vs. new-customer acquisition
  • – Needs CRM/email setup; won't grow the funnel top

B · Seasonal plan for Sep–Nov

Align marketing spend and inventory with the proven Q4 peak.

  • + Lifts the biggest months of the year
  • – Concentrates risk; stock-outs/over-stocks are costly if mis-forecast

C · Cut returns & de-risk geography

Investigate the most-returned SKUs; protect single-account export markets.

  • + Recovers part of the ~7% leak
  • – Root-cause work is slow; diversifying markets is a long game
5 · Conclusion & 6 · Next steps

Key takeaways

Recommended next steps — what · who · when

  1. 1
    Launch a tiered retention program (Champion perks + win-back for "At risk"). Who: CRM/Marketing · When: next quarter, measure repeat-rate uplift at 90 days.
  2. 2
    Build the Sep–Nov seasonal plan (budget + inventory locked by August). Who: Marketing + Ops · When: annually, sign-off by 31 Aug.
  3. 3
    Stand up a returns dashboard and investigate top returned SKUs. Who: Analytics + Merchandising · When: within 60 days.
  4. 4
    Key-account plan for single-customer export markets (EIRE, NL). Who: Sales · When: ongoing, quarterly review.