Insights
Cloud28 March 2026

AI workloads are hiding in your cloud bill

Nobody knows what AI actually costs because inference runs on shared compute with no attribution. That is a platform architecture problem.

What your cloud bill actually looks like

AI costs buried in shared infrastructure. Nobody can tell you what AI actually costs.

Today

MONTHLY CLOUD INVOICE

Compute (general)$48,200
Shared K8s cluster$31,400
GPU instances???
LLM API calls???
Embedding jobs???
Storage + network$12,100
Total$127,000

"We think AI is roughly 30% of the bill"

After attribution
AI workloads$48,260
GPU inference $27,940LLM APIs $13,970Embeddings $6,350
Platform$57,150
Other$21,590

Now you know where to cut.

You cannot optimise what you cannot measure. Attribution comes first.

Your cloud bill went up 40% last quarter. Finance wants answers. Engineering says it is the AI workloads. But nobody can prove it because AI costs are not attributed separately.

This is the most common pattern I see in cloud cost reviews. AI inference runs on the same compute as everything else. GPU instances are shared across teams. LLM API calls go through a single account with no per-service breakdown. The embedding pipeline uses the same Kubernetes cluster as the web application.

The result: nobody knows what AI actually costs. Estimates range from "maybe 15%" to "probably 40%" of the total bill. Both numbers are guesses because nobody is measuring.

This matters for three reasons. First, you cannot optimise what you cannot measure. If AI inference is 35% of your compute bill, you should be looking at model distillation, caching, and batch inference. If it is 8%, you should not.

Second, AI cost growth is non-linear. A successful AI feature generates more usage, which generates more inference calls, which generates more embedding updates. Without cost attribution, you do not see the curve until it hits the budget ceiling.

Third, the CFO will eventually ask. When they do, "we think it is roughly 30%" is not an answer that builds confidence in the AI programme.

The fix is infrastructure, not finance.

  • Tag AI workloads at the resource level with separate node pools and dedicated namespaces
  • Route LLM API calls through an internal gateway that tracks per-service usage
  • Build cost dashboards that show AI spend as a first-class category
  • Set budget alerts on AI-specific resources before they surprise you

The teams that separate AI workloads early spend less and scale faster. The ones that do not end up in a cost review where nobody can explain the numbers.

ShareLinkedIn

Get the next one in your inbox

One short, opinionated field note per fortnight on platform engineering, cloud, and making AI work in production. No spam. Unsubscribe anytime.

Senna Semakula

Senna Semakula

Founder, Atruvo

Bring your architecture diagram, cloud bill, or last incident summary.

I will tell you what is actually breaking.

30 minutes. No pitch. Ranked risks and a clear next step.