How do you measure whether an enterprise AI agent is ready to scale?

By the four-phase exit criteria, not by the model's benchmark scores. The readiness signal is operational: scoped identity in place, audit captures intent, discovery-mode plans match human-operator plans, governed execution has run a representative volume of writes with zero unrecoverable actions, and a governance cadence is established and attended. If any of those is missing, the agent is not ready to scale, regardless of how the model performs in isolation.