Insights
Data28 May 2026

The real cost of AI is not the model. It is the data pipeline.

Every AI business case focuses on model costs. They are also the minority of the total cost. The data pipeline is typically 60-70%.

Where AI budgets actually go

Based on cost reviews across 14 AI projects in production.

What the business case covers

Model costs

20-30%

GPU computeAPI callsTraining
WATERLINE

Data pipeline costs

60-70%

Buried in shared infrastructure. Invisible to the business case.

Extraction15-20%

Read replicas, CDC, API limits

Transform15-20%

Cleaning, validation, dedup

Feature eng.15-20%

Embeddings, vectorisation

Storage10-15%

Inputs, outputs, audit trail

Monitoring and tooling5-10%

To reduce AI costs

Start with the pipeline

To budget for AI

Start with the pipeline

The model is rarely where the money is. The data pipeline is.

Every AI business case focuses on model costs. GPU compute. API pricing. Training runs. These are visible, measurable, and usually the first line item in the budget.

They are also the minority of the total cost.

The real cost of AI in production is the data pipeline that feeds it. And most organisations have no idea how much that costs because it is buried in shared infrastructure that existed before the AI project started.

Data extraction from source systems. These systems were not designed for the throughput AI needs. Adding extraction load impacts their performance. You need dedicated read replicas, CDC streams, or API rate limit increases. Each has a cost.

Data transformation and cleaning. AI models are sensitive to data quality in ways that dashboards and reports are not. A missing field that a BI tool ignores will cause a model to produce nonsensical outputs. The transformation layer needs to be more rigorous, which means more compute, more storage, and more engineering time.

Feature engineering and embedding generation. Converting raw data into model inputs requires compute that scales with data volume. When your data doubles, your feature pipeline cost doubles.

Storage for model inputs, outputs, and evaluation data. AI governance requires you to store what the model saw and what it produced, at minimum for audit purposes. This storage grows linearly with usage and nobody budgets for it.

When I do cost reviews for AI projects, the data pipeline is typically 60-70% of the total infrastructure cost. The model is 20-30%. The rest is monitoring and tooling. If you are building a business case for AI, start with the data pipeline costs. If you are trying to reduce AI costs, start with the data pipeline. The model is rarely where the money is.

ShareLinkedIn

Get the next one in your inbox

One short, opinionated field note per fortnight on platform engineering, cloud, and making AI work in production. No spam. Unsubscribe anytime.

Senna Semakula

Senna Semakula

Founder, Atruvo

Bring your architecture diagram, cloud bill, or last incident summary.

I will tell you what is actually breaking.

30 minutes. No pitch. Ranked risks and a clear next step.