Automating Product Enrichment for 50,000 SKUs
50,000 SKUs. Enriched and standardized in 5 days.
The Challenge
A Series B e-commerce brand had grown through acquisition and rapid catalog expansion, resulting in 50,000 SKUs with wildly inconsistent product data. The same attribute might be described as "Color: Navy", "Colour: Dark Blue", or not listed at all depending on which team had entered the data. Product descriptions ranged from detailed paragraphs to single sentences.
The downstream impact was severe: on-site search returned irrelevant results, category filters were unreliable, and the recommendation engine had poor signal quality. Customer support tickets about "can't find product" had doubled in six months.
The Solution
We built a batch processing pipeline that reads the existing catalog data, enriches descriptions, extracts and standardizes attributes, and generates missing taxonomy labels. Claude Opus 4.6 handles semantic understanding - reading product descriptions and images to infer attributes that were never explicitly entered. GPT-5.4 performs structured attribute extraction, mapping free-text descriptions to a standardized attribute schema.
Cross-validation between the two models catches inconsistencies: if one model classifies a product differently than the other, the item is flagged for human review. The pipeline processes approximately 10,000 SKUs per hour and outputs clean structured data formatted for direct import into their e-commerce platform.
We also built a lightweight review dashboard where the merchandising team could approve batched changes before they went live - giving them control without requiring them to touch individual SKUs.
Results
Our search finally works. Customers are finding products they didn't know we carried.
VP of Product - E-commerce Brand
Facing a similar challenge?
Book a Call