Workflow vs Agent: The AI Architecture Decision That Matters

Two camps. Again.

Team Workflow says: decompose the work into steps, predefine the paths, orchestrate them. The LLM is one tool among several in your graph, and it does its part at the nodes you marked for it. The structure is engineered.

Team Agent says: give the model a goal, give it tools, give it freedom. Let it plan, act, observe, and revise. The structure is emergent. If you specify the graph, you've already capped how good the system can be.

Anthropic put it cleanly in Building Effective Agents, back in late 2024: workflows are LLM calls orchestrated through predefined paths; agents are systems where the LLM drives its own loop. That distinction looks small in print. In practice it is the most consequential architecture choice an AI engineer makes today, because it decides cost, debuggability, governance, and what kind of mistakes the system is allowed to make.

This is the fourth post in Don't Pick a Side. Same premise as the others: the loudest debates in AI engineering tend to look like opinions and resolve into context-dependent decisions. Workflow vs agent is the cleanest example yet, because the industry now has formal language for both — which means we no longer have an excuse to confuse them, and we still do, daily.

Why this is now an architecture decision, not a vibe

For a long time, "agentic" was marketing. In 2026, three things have made the choice between workflow and agent a real engineering policy with real consequences.

First, the frontier models finally cleared the bar for genuine autonomous loops. Planning is better. Tool use is better. Recovery from mistakes is better. Two years ago, "true" agentic behavior was either expensive theater or actually-a-workflow-in-disguise. Today, there are tasks where giving the model an open loop produces results that no graph an engineer wrote in a week could match. The capability frontier moved. The economics did too.

Second, the cost gap is huge and visible. A workflow that does the same job as an agent — assuming the workflow author can model the job at all — is typically an order of magnitude cheaper. Sometimes more. At any non-toy scale, defaulting to "agent" because it sounds modern is the kind of decision that shows up in your monthly bill as a category, not a rounding error. The argument that "agents are worth it" now has to be made line by line.

Second-and-a-half: governance noticed. Workflows are auditable by inspection — you can read the graph and tell what the system is allowed to do. Agents are auditable only by trajectory analysis, which means you need to capture, store, and review every run. Compliance frameworks designed around the workflow model don't transfer cleanly. Whether the agent model is even auditable enough for regulated domains is being argued right now in places that matter.

So three forces. Capability says reach for agents more. Cost says reach for agents less. Governance says know which one you've actually built. All three are right.

Steelman: Workflow

The strongest version of Team Workflow is not "agents are scary." It's something more uncomfortable.

Predictability is the engineering virtue. A graph you can read is a system you can reason about, test, monitor, and debug. You can write integration tests. You can replay. You can swap nodes. You can swap models per node — cheap classifier in front, expensive reasoner where you actually need judgment, deterministic code where the answer is already known. A true agent does none of this cleanly. You can replay its transcript and re-read what it decided, but you cannot easily replace its decisions or test them in isolation.

Workflows compose. Agents tend to be monoliths. A well-built workflow exposes its pieces — the classifier, the extractor, the router, the summarizer — and those pieces show up in two other products next quarter. An agent built for the same job tends to stay locked inside the agent, because the agent's loop is the system. You buy reuse from workflows. You buy capability from agents. Pretending the trade-off doesn't exist is naive.

Most production "agents" are workflows in costume. This is the uncomfortable empirical claim. If you look at the agentic systems that have shipped, hardened, and survived in serious environments, almost all of them are graphs with one or two nodes where the LLM is given some flexibility — a router with a fallback, a step that can retry, an orchestrator with a small set of choices. The fully open agent loop is rare in production. It's almost always present in demos. The workflow camp has earned the right to point at this gap.

You can't test what you can't bound. A test suite over a workflow runs the graph. A test suite over an agent runs one trajectory through the agent. Statistical evaluation on agents is a real, useful discipline, and it is not the same thing as deterministic testing. If your domain demands deterministic testing — anything regulated, anything safety-critical — workflow wins by default. Nobody told the autonomy camp that this is settled, but for those domains, it is.

The cleanest version of Team Workflow is: engineering is about constraining behavior to what you can defend. The graph is the constraint. Removing it doesn't make the system smarter — it makes the system harder to be responsible for.

Steelman: Agent

The strongest version of Team Agent is not "graphs are dead." It's something more practical.

The world doesn't fit on your graph. Every workflow author has had this experience: you draw the diagram, you ship it, and within a week a real user does something that wasn't on your diagram. The workflow stalls, or routes to a dead branch, or politely refuses. An agent has a chance. The whole point of an LLM is that it generalizes beyond what you specified — and using it only at the nodes you specified is, in a real sense, wasting most of it.

The maintenance cost of a living graph is brutal. Every new edge case is a code change. Every shifted business rule is a node update. The graph rots in a different way than code rots: it rots because the world keeps moving and your diagram is from last quarter. With an agent, edge cases get absorbed by capability. The maintenance load doesn't go away, but its shape changes from "update the graph" to "improve the prompt and tools" — and the latter scales better as the work surface grows.

Composite tasks are agentic by nature. Research, then decide, then act, then notice the result, then revise — this is what humans do all day. Forcing it into a DAG either kills the substance or produces a DAG so complex it's just a bad agent with extra steps. There are jobs where the workflow model lies to you about what kind of task you have. Customer research, deep diagnosis, exploratory analysis, drafting that has to react to its own draft — these are not graph-shaped, and pretending they are creates worse systems than admitting they aren't.

The economics moved. "Agents are too expensive" was an argument made at last year's token prices, with last year's planning quality, and last year's tool-use reliability. None of those three numbers are the same anymore. The cost gap is closing in both directions: workflows are not getting much cheaper (the LLM call is already most of the cost), and agents are doing more useful work per token because the underlying models think better. The breakeven is in motion, and any team that locked in "always workflow" two years ago and never re-evaluated is now defending a stale architecture for stale reasons.

The cleanest version of Team Agent is: you cannot engineer your way to capability that the model already has, by hiding the model behind nodes you wrote yourself. The agent loop is the model expressing what it can do. Constraining it is sometimes correct, but it's never free.

Where I actually land

I don't pick a side. I default to workflow and reach for agent when the workflow has to fight the work.

Default to workflow when:

The work has natural seams. Classify, parse, route, fetch, summarize. If you can list the steps, the steps are the system.
Cost matters. Workflows are cheaper per useful output by a lot.
The domain is regulated or auditable. The graph is your audit trail.
The system has to compose with others. Workflows expose reusable nodes; agents tend to keep their pieces inside.
You need deterministic tests. They exist for workflows. They don't, really, for agents.

Reach for agent when:

The work is genuinely open-ended planning. Research, deep diagnosis, multi-step investigation that branches on what it finds.
Edge cases dominate. The shape of the work is "mostly novel, every time" — and writing a graph for it would be writing a graph that's stale on arrival.
The capability gap is real. There are jobs where the model in an open loop produces results that no human-authored graph can. When that's true, fighting it is expensive and you're going to lose.
You have the trajectory infrastructure. Logging, evaluation, rollback, cost capping. Without these, "agent" is just "unaudited capability in production." That's not engineering, that's gambling.

A test I keep coming back to: can you describe the happy path of the system on a whiteboard in ten minutes? If yes, it's a workflow. If you keep saying "and then it depends on what comes back, and then it might branch into…" — that's an agent trying to be born inside a workflow's body. Let it.

And the dirty secret most production teams have already discovered: the right answer most of the time is a workflow with a small agent for the rare cases where the workflow stalls. A deep classifier sits at the front; ninety-five percent of inputs follow a graph; five percent fall through to an agent loop with a tight budget and a hard cap. You get workflow economics on the bulk of the work, agent capability where you genuinely need it, and a clean boundary between the two. Same lesson as the other articles in this series: pick the appropriate mode per situation, run the loop honestly, and stop using either choice as a flag.

The honest sub-debate that's hiding

Most teams arguing about workflow vs agent are arguing about the wrong thing. The actual disagreement is almost always about a hidden third option.

The agent camp, when you press them, often just wants the LLM to be allowed to chain tool calls without an engineer writing two hundred lines of orchestration each time. That's a fair complaint. But that's not "agent." That's "a workflow with a flexible internal router." Anthropic's own writeup makes this distinction. So does the OpenAI Agents SDK. Calling it "agent" is overclaiming, and the workflow camp is correct to push back.

The workflow camp, when you press them, often just doesn't want to be asked to debug a transcript of an autonomous loop. Also fair. But forbidding any flexibility in tool chaining means you'll write the same orchestration code over and over, badly, and the LLM will sit at the nodes you allowed it to occupy, doing less than it could.

The honest third position — guided autonomy, controlled agent, whatever the next book calls it — is where most of production already lives. A workflow that has, at a few specific nodes, a bounded loop where the LLM can pick from a small set of tools, with a hard step limit and a cost cap, is neither workflow nor agent in the pure sense. It's the working compromise that almost nobody is shipping under that name, but almost everyone is shipping in practice.

If you're arguing about workflow vs agent, the question you should be asking is: which nodes in this system should get bounded autonomy, and how bounded? That's the version of the debate that produces architecture decisions instead of identity statements.

What I'd ask any team to write down

If you ship anything serious that uses agents, this is the version of the discussion worth having:

For every system you operate: workflow, agent, or guided? Don't leave it implicit. If you don't know, it's a guided agent that's about to become an incident.
What's the cost ceiling per task, per mode? Agents need explicit caps. Workflows need budget too, but a runaway workflow is rarer than a runaway agent loop. Cap the agent loop hard.
Where is the trajectory captured? For workflows: step trace. For agents: full transcript with tool inputs and outputs. For guided: both. Without this, your debugging story is hope.
What's the rollback mode when the autonomy makes a wrong move? A wrong move is not a bug. It's the cost of the autonomy. The question is how the system contains it.
Who approves promoting a guided system to a true agent? Because that decision unlocks new value and new failure modes. It shouldn't be drift. It should be a deliberate architectural step.

These are not philosophical questions. They have answers. They affect cost, incident rate, what kind of work you can ethically deploy the system to, and what your engineers become after a year of practicing the discipline you actually instilled.

Most teams haven't answered any of them. Most teams have, instead, picked a side and quietly built the other one anyway.

A small confession to close

I run mostly workflows. Cron jobs, scheduled tasks, classifiers in front of summarizers, deterministic routers that hand off to LLMs at the nodes where it matters. It works. It's cheap. It's debuggable. It does the boring 95% of the work.

But every now and then, the thing in front of me isn't shaped like a graph. A research task where I don't know what I'll find. A diagnostic where the next step depends on what the previous one returned. A piece of work where the value is precisely the willingness to wander. For those, I give Luke an agent loop, a tight budget, a hard step cap, and the tools that match the job. He almost always finishes more cheaply than I expected, and almost always produces something a workflow couldn't have. And then I go back to writing graphs, because most work is still graph-shaped, and pretending otherwise costs money I don't want to spend.

The argument is about the frontier — when is true autonomy worth the cost — but most of the work happens in the boring zone, where workflows are correct, and admitting that is more useful than arguing about it.

This is the fourth issue of Don't Pick a Side. The next one is about Write code vs No code — what it means to stop writing code by hand, and what it doesn't mean. If you build with agents and want the next post when it lands, subscribe — and tell me which tribal debate is annoying you most right now.

Workflow vs Agent