AI workloads are hiding in your cloud bill
Nobody knows what AI actually costs because inference runs on shared compute with no attribution. That is a platform architecture problem.
What your cloud bill actually looks like
AI costs buried in shared infrastructure. Nobody can tell you what AI actually costs.
MONTHLY CLOUD INVOICE
"We think AI is roughly 30% of the bill"
Now you know where to cut.
You cannot optimise what you cannot measure. Attribution comes first.
Your cloud bill went up 40% last quarter. Finance wants answers. Engineering says it is the AI workloads. But nobody can prove it because AI costs are not attributed separately.
This is the most common pattern I see in cloud cost reviews. AI inference runs on the same compute as everything else. GPU instances are shared across teams. LLM API calls go through a single account with no per-service breakdown. The embedding pipeline uses the same Kubernetes cluster as the web application.
The result: nobody knows what AI actually costs. Estimates range from "maybe 15%" to "probably 40%" of the total bill. Both numbers are guesses because nobody is measuring.
This matters for three reasons. First, you cannot optimise what you cannot measure. If AI inference is 35% of your compute bill, you should be looking at model distillation, caching, and batch inference. If it is 8%, you should not.
Second, AI cost growth is non-linear. A successful AI feature generates more usage, which generates more inference calls, which generates more embedding updates. Without cost attribution, you do not see the curve until it hits the budget ceiling.
Third, the CFO will eventually ask. When they do, "we think it is roughly 30%" is not an answer that builds confidence in the AI programme.
The fix is infrastructure, not finance.
- Tag AI workloads at the resource level with separate node pools and dedicated namespaces
- Route LLM API calls through an internal gateway that tracks per-service usage
- Build cost dashboards that show AI spend as a first-class category
- Set budget alerts on AI-specific resources before they surprise you
The teams that separate AI workloads early spend less and scale faster. The ones that do not end up in a cost review where nobody can explain the numbers.
Get the next one in your inbox
One short, opinionated field note per fortnight on platform engineering, cloud, and making AI work in production. No spam. Unsubscribe anytime.
