The real cost of AI is not the model. It is the data pipeline.
Every AI business case focuses on model costs. They are also the minority of the total cost. The data pipeline is typically 60-70%.
Where AI budgets actually go
Based on cost reviews across 14 AI projects in production.
Model costs
20-30%
Data pipeline costs
60-70%
Buried in shared infrastructure. Invisible to the business case.
Read replicas, CDC, API limits
Cleaning, validation, dedup
Embeddings, vectorisation
Inputs, outputs, audit trail
To reduce AI costs
Start with the pipeline
To budget for AI
Start with the pipeline
The model is rarely where the money is. The data pipeline is.
Every AI business case focuses on model costs. GPU compute. API pricing. Training runs. These are visible, measurable, and usually the first line item in the budget.
They are also the minority of the total cost.
The real cost of AI in production is the data pipeline that feeds it. And most organisations have no idea how much that costs because it is buried in shared infrastructure that existed before the AI project started.
Data extraction from source systems. These systems were not designed for the throughput AI needs. Adding extraction load impacts their performance. You need dedicated read replicas, CDC streams, or API rate limit increases. Each has a cost.
Data transformation and cleaning. AI models are sensitive to data quality in ways that dashboards and reports are not. A missing field that a BI tool ignores will cause a model to produce nonsensical outputs. The transformation layer needs to be more rigorous, which means more compute, more storage, and more engineering time.
Feature engineering and embedding generation. Converting raw data into model inputs requires compute that scales with data volume. When your data doubles, your feature pipeline cost doubles.
Storage for model inputs, outputs, and evaluation data. AI governance requires you to store what the model saw and what it produced, at minimum for audit purposes. This storage grows linearly with usage and nobody budgets for it.
When I do cost reviews for AI projects, the data pipeline is typically 60-70% of the total infrastructure cost. The model is 20-30%. The rest is monitoring and tooling. If you are building a business case for AI, start with the data pipeline costs. If you are trying to reduce AI costs, start with the data pipeline. The model is rarely where the money is.
Get the next one in your inbox
One short, opinionated field note per fortnight on platform engineering, cloud, and making AI work in production. No spam. Unsubscribe anytime.
