AI Governance13 May 2026

Every enterprise has an AI strategy. Almost none have an AI operations plan.

The board approved your AI strategy. But nobody planned how to run AI systems in production at 2am when the model starts returning garbage. That gap is where the next outage is hiding.

Pattern·AI strategy without operations plan

What the board approved vs what production needs

14 organisations reviewed. 14 had a strategy. 0 had an operations plan.

Strategy14/14 orgs

The slide deck everyone approved

✓Business case and ROI model

✓Vendor selection (OpenAI, Anthropic, etc.)

✓Integration timeline and milestones

✓Executive sponsorship

✓Budget allocation

✓Use case prioritisation

Ready for launch

Looks great in the board deck

Operations0/14 orgs

The runbook nobody wrote

✗Who gets paged at 2am?

✗Rollback procedure (under 10 min)

✗Quality degradation detection

✗Real-time cost monitoring per service

✗Vendor outage fallback path

Not ready for 2am

This is where the outage hides

Strategy

GAP

Operations

Demos do not get paged at 2am. Production systems do.

I have reviewed AI strategies at 14 organisations in the last 18 months. Every one had a strategy. A roadmap. Executive sponsorship. Approved budget. Not one had an operations plan.

An AI strategy tells you what to build. An AI operations plan tells you how to keep it running. Most organisations have the first. Almost none have the second.

Here is what I mean. Your AI strategy says you will deploy a customer-facing recommendation engine by Q3. It covers the business case, the vendor selection, the integration timeline. It does not cover what happens when the model starts recommending products that are out of stock. Or when inference latency spikes because a GPU node failed. Or when the training data pipeline breaks on a Friday night and nobody notices until Monday.

These are not edge cases. They are the normal operating conditions of AI systems in production. Every AI system I have seen in production has experienced at least one of these in its first 90 days.

An AI operations plan answers five questions that your strategy does not.

Who gets paged at 2am when the model degrades? Not the data scientist who trained it. The platform engineer who runs the infrastructure it sits on.
What is the rollback plan? Not "retrain the model." A specific, tested procedure that reverts to the previous version in under 10 minutes.
How do you detect quality degradation before customers do? Not uptime monitoring. Output quality tracking with automated regression alerts.
What is the cost ceiling? Not the budget for the project. A real-time view of what each AI service costs per day, with alerts when it exceeds thresholds.
What happens when the AI vendor has an outage? Not "we wait." A fallback path that degrades gracefully and tells users what is happening.

The organisations getting this right are the ones where the platform team is involved before the AI system goes live. They build the runbooks, the monitoring, the rollback procedures, and the cost controls alongside the model integration. Not after it.

The organisations getting it wrong are treating AI like a feature launch instead of an infrastructure deployment. They celebrate the go-live and then scramble when the first incident hits.

If you have an AI strategy but no operations plan, you are not ready for production. You are ready for a demo. And demos do not get paged at 2am.

ShareLinkedIn

Get the next one in your inbox

One short, opinionated field note per fortnight on platform engineering, cloud, and making AI work in production. No spam. Unsubscribe anytime.

Senna Semakula

Founder, Atruvo

PreviousYour AI compliance audit will fail. Here is why.NextStop hiring AI Engineers. Hire platform engineers who can read papers.

Bring your architecture diagram, cloud bill, or last incident summary.

I will tell you what is actually breaking.

30 minutes. No pitch. Ranked risks and a clear next step.