GoodRules

The Risk-Automation Spectrum

There’s a shift happening that I don’t think enough people are talking about. We’re not just generating text anymore — we’re triggering actions. Real ones. Agents are writing code, calling APIs, moving money, modifying infrastructure. The jump from “suggest an answer” to “go do the thing” is massive, and yet a lot of the conversation around agentic AI still feels stuck on benchmarks and vibe coding.

This mental model started crystallizing for me when I was leading generative AI efforts at a major healthcare and academic institution. In that world, “confidently wrong” isn’t just inconvenient — it’s life-altering. That experience fundamentally rewired how I think about AI autonomy, and it’s only been reinforced since, working across every other industry I’ve touched.

The thing that keeps me up at night isn’t whether the model is smart enough. It almost always is, especially with the right context and building blocks. The thing that keeps me up is what happens when it’s confidently wrong and nobody’s watching. Because in an agentic workflow, a bad output isn’t just a bad paragraph — it’s a bad action. And actions have consequences that are a lot harder to ctrl+z.

The Spectrum

Here’s the mental model I keep coming back to: autonomy should scale inversely with risk. That’s it. That’s the thesis. Simple to say, surprisingly hard to actually build around.

Think of it as three rough levels:

Full autonomy — The agent acts on its own. No human approval needed. This is where you want your low-risk, high-volume tasks. Summarizing a document, categorizing support tickets, generating a first draft of something. If the agent gets it wrong, the cost is low and the fix is easy. Let it rip.

Human-in-the-loop — The agent does the work, but a human reviews and approves before anything actually happens. This is the sweet spot for a huge chunk of enterprise workflows. Think code deployments, financial transactions, customer-facing communications. The agent handles the heavy lifting, but a human is the final gate, or multiple gates in the middle. The key here is that the human isn’t doing the work — they’re validating it, perhaps even refining it. That’s a fundamentally different (and much faster) cognitive task.

Human-in-the-driver-seat — The agent assists, but the human is doing the core work. High-stakes decisions, legal reviews, medical diagnoses, anything where the consequences of being wrong are severe or irreversible. I’ve seen this firsthand — in healthcare, you don’t get to “iterate” on a bad decision. The AI is a copilot, not the pilot. It surfaces information, suggests options, catches things you might miss — but the human owns the outcome, and frankly most of the process.

The mistake I see teams make over and over is treating this like a binary. Either the AI is fully autonomous or it’s not being used at all. That’s leaving an enormous amount of value on the table. The spectrum is where the real utility lives.

Human-in-the-Loop Isn’t a Speed Bump

This is the misconception that genuinely frustrates me. I hear it constantly — “if you need a human in the loop, what’s the point of the AI?” As if the only value of automation is eliminating humans entirely. That’s not the goal. The goal is making the overall system faster, cheaper, and more reliable than either humans or AI could achieve alone.

Here’s a distinction that matters more than people realize: there’s a difference between human-in-the-loop at training time and human-in-the-loop at runtime. RLHF? That’s training time. We’re not talking about that. We’re talking about runtime HITL — the human who reviews, approves, or rejects an agent’s proposed action before it executes.

Runtime HITL isn’t a bottleneck. It’s a feature. It’s what lets you deploy agentic systems in high-stakes environments without holding your breath every time the agent runs. And the beautiful thing is that as trust builds — through logs, through track records, through verifiable reasoning — you can gradually extend the autonomy. The human manually approves less and less, not because you removed the guardrail, but because the system earned the right to operate with less oversight.

That’s how you actually get to full-ish autonomy for higher-risk tasks. You don’t skip the middle. You graduate.

Engineering the Guardrails

Okay, so if the spectrum is the “what,” this is the “how.” And honestly, this is where I think the industry is still figuring things out. But there are a few patterns that I’ve seen work well.

Context engineering over prompt engineering. Everyone obsesses over the prompt, but the real leverage is what context you feed the agent, when you give it, and how you structure it. What tools does it have access to? What data can it see? What actions can it take? These are architectural decisions, not prompt decisions. The NIST AI Risk Management Framework calls this “Govern” and “Manage” — defining the boundaries of what the system can do before it ever runs. I like to think of it as “policy before prompt.” Get the constraints right, and the prompt can actually do what it’s supposed to. “Garbage in, garbage out” is still very much a thing.

Idempotency. This one is huge and almost never discussed in the AI context. If your agent is going to take actions — especially actions that modify state — those actions need to be safely repeatable. If the agent retries a failed API call, you don’t want it to double-charge someone or create duplicate records. This isn’t a new concept (anyone who’s built distributed systems is nodding right now), but it’s critical for agentic orchestration and I rarely see it treated as a first-class concern.

Verifiable reasoning traces. The agent should show its work. Not just the final output, but the chain of reasoning that got it there. This isn’t about explainability for its own sake — it’s about giving the human reviewer (remember them??) something meaningful to validate. If I can see why the agent made a decision, I can approve or reject it in seconds. If I just see the decision itself, I’m basically starting from scratch. The difference between a 5-second review and a 5-minute review is the difference between HITL being practical and being a bottleneck. Also, this can be programmatic or in the actual agent response. I prefer both.

These aren’t hypothetical patterns. They’re the difference between an agentic system that works in a demo and one that works in production.

Trust Is the Currency

Here’s what it all comes down to: trust. And trust isn’t binary either — it’s a spectrum, just like the autonomy it enables.

We’re in a weird moment where the models are genuinely capable enough to do real, consequential work, but our architectures haven’t caught up. We’re still building systems that are either fully locked down or fully open, and neither extreme serves us well. The capability-control gap is real — the distance between what these models can do and what we can safely let them do is growing, not shrinking.

If you’re building agentic systems right now, take a hard look at where your workflows sit on the risk spectrum. Match the autonomy level to the actual stakes. Build in the review loops. Make the reasoning visible. And then — gradually, deliberately — let the system earn more trust over time.

← Back to blog