200,000 SKUs into an 8-level taxonomy — at 95%+ accuracy.

Automated category mapping for mcgrocer.com — combining LLM semantic understanding with vector similarity. Catalog work moved from months to hours, and vendor inventory hits the site within minutes of arrival.

All customer stories

The challenge

The catalog scaled faster than the people who maintained it — and search felt it first.

Vendor data, inconsistent shapes

Hundreds of thousands of products were pulled from supermarkets, wholesalers, and specialty retailers — each using different naming conventions, attribute fields, and source category structures.

Mis-categorised SKUs

Products regularly landed in wrong or irrelevant categories, breaking discovery and navigation. Shoppers found themselves filtering through noise, and conversion suffered.

Manual catalog operations

New items required hand review and reassignment. As volume scaled toward 200,000+ SKUs, the manual team became the rate-limiter on how fast new inventory could go live.

Taxonomy drift

Mapping was inconsistent across vendors and across analysts. The redesigned 8-level taxonomy lost its meaning the moment classification was left to individual judgement at scale.

Slow updates, broken search

Vendor pushes that should have hit the site in minutes took days — and the resulting catalog state made search and navigation feel unreliable to repeat shoppers.

What we built

An automated mapping engine, tuned to the long tail.

Engine

AI category mapping engine

Autonomously analyses titles, descriptions, ingredients, brand names, and attributes to identify the correct taxonomy path for each incoming SKU — no analyst in the loop for the common case.

Taxonomy

8-level taxonomy alignment

Maps each product across every level — broad department down to the deepest sub-category — so classification is consistent end-to-end and downstream search behaves predictably.

Model

LLM + vector similarity hybrid

Combines LLM semantic understanding with vector similarity scoring — semantics handle nuance, similarity grounds the result against the catalog the system has already classified.

Throughput

Automated batch processing

New products are processed in batches the moment they land from vendor feeds. No queue, no manual triage, no waiting room — inventory hits the site within minutes of arrival.

Quality

Confidence scoring & validation

Built-in rule checks and per-SKU confidence scores filter the catalog into 'auto-publish' and 'send to human' — review effort concentrates only on genuinely ambiguous cases.

Operations

Continuous learning

The engine learns from corrections and historical patterns. Accuracy compounds as the catalog grows — and as taxonomy edits land, prior classifications can be re-evaluated in bulk.

Impact

Catalog at scale — without a manual team behind it.

95%+

Classification accuracy across all taxonomy levels

Hours, not months

Time to classify large vendor batches

200K+

SKUs held consistently in an 8-level taxonomy

Azure·LLM + vector hybrid·Confidence scoring·Batch automation·Continuous learning·8-level taxonomy

Related cases

Other engagements worth a look.

Life Sciences · Benelux · Azure

A multi-agent RAG platform replaced shadow-AI in a global pharma

Real Estate · North America · SaaS & GenAI

POC to production-grade AI real estate platform

Start the conversation

Got a similar problem? Talk to us.

Messy vendor feeds, drifting taxonomies, manual catalog teams as the bottleneck — we’ve cleaned this up before. A pod can be classifying your catalog within weeks.