Two camps. Again.

Team Always-On says: a useful agent has to be there when something happens. A scheduled job is not awareness. Real autonomy means continuous attention — heartbeats, polling, background workflows, an agent that notices.

Team Reactive says: every cycle an agent spends not serving a request is a token you paid for nothing. The honest framing of AI today is request-response. Calling it "always on" is marketing.

Both camps have shipped systems that work. Both have shipped systems that bankrupt their owners in tokens or miss the things they were supposed to catch. And in the middle, most teams I know are quietly bolting heartbeats onto reactive agents and pretending that's presence, while burning through their budget every thirty minutes to check whether anything happened.

This is the third post in Don't Pick a Side. The first two were about how we talk to agents and how we build with them. This one is about how we let them exist. It might be the most consequential of the three, because the choice you make here decides what your monthly bill looks like, what your incidents look like, and what your agent feels like to be around.

Why this is now an infrastructure decision

For a long time, "always-on AI" was a marketing slogan. In 2026, three things have made it a real infrastructure choice with real consequences.

First, background workflows are mainstream. The Continue.dev team put it cleanly when they launched theirs: "Most AI tools today are stuck in a request-response paradigm." Hosted agents like Codex and Devin solved the async problem — your workflow isn't blocked while the agent works — but they still need you to start the task. That's not awareness. That's delegation. The frame is shifting from "you initiate, the agent responds" to "the agent watches, you supervise." Whether the frame is right is exactly what we're arguing about.

Second, the cost of polling is no longer a rounding error. A human brain runs on roughly 20 watts, continuously, and most of that goes to autonomic functions — breathing, balance, heartbeat — that don't require deliberation. An LLM "breathing" via heartbeat polls burns tokens every cycle whether anything happened or not. At individual scale, it's pocket change. At fleet scale, where dozens of agents wake up every thirty minutes to confirm there's nothing to do, it adds up to a line item you have to defend.

Third, the governance side is catching up. AWS's Agentic AI Security Scoping Matrix drew a clear line between stateless request-response systems and agentic systems with persistent memory, tool orchestration, and continuous activity. That line has compliance implications now. Persistence introduces audit obligations. Polling introduces failure modes. Presence introduces, well — what exactly? That's the question nobody has finished answering.

So three forces, pulling in different directions. Cost says be reactive. Capability says be present. Governance says know what you've built and account for it.

Steelman: Always-On

The strongest version of Always-On is not "agents should be conscious." It's something more practical.

Useful agents notice. A scheduler is not awareness. A cron job that fires at noon and reads your calendar is not "noticing your day." Noticing requires that something be watching the signal as it changes — the inbox while it fills, the deploy log while it streams, the customer chat while it scrolls — and deciding, continuously, what deserves an interruption. Without that, the agent is a delivery boy who shows up at noon every day regardless of whether anything arrived. Useful when the work happens to align with noon. Useless the rest of the time.

Signal lives in the gaps. The interesting events almost never happen during your polling window. They happen between. The build that broke at 11:43 and was silently masked by the 12:00 sweep. The customer that left a stuck checkout at 9:58 and never came back. The metric that began to drift at 7am and didn't pass the threshold until midday. A reactive agent, by definition, only sees what trips a request. The always-on camp is right that this is a thinner slice of reality than reactive proponents want to admit.

You can't "wake up" presence. This is the part that the reactive camp keeps missing. Continuity isn't a property you can simulate by polling more often. You can poll every minute and still miss the precondition that made the event possible at minute 47. The properties of being there — context that accumulates, micro-decisions that compound, the slow shift in priorities as the day unfolds — are not properties you can reconstruct from snapshots. Either the agent has been there, or it's filling in the gaps with guesses. The two look the same on the dashboard. They don't behave the same in production.

The cleanest version of Always-On is: the gap between an agent that responds and an agent that's there is real, it's expensive to close, and pretending it's a marketing problem instead of an architecture problem is how you build systems that miss the things you hired them to catch.

Steelman: Reactive

The strongest version of Reactive is not "AI is just a function call." It's something more honest.

Tokens are real. Every cycle the agent runs without a useful job is real compute you bought for no return. At fleet scale this is not a rounding error, it's a category. The continuous-presence camp keeps describing the capability of always-on agents but skipping the cost of always-on agents. The reactive camp owns the math.

Polling is honest. Presence is hand-wavy. The thing the always-on camp calls "presence" is, almost always, polling at higher frequency. A heartbeat every thirty seconds is not consciousness. It's a cron job with prettier branding. Calling it presence obscures what's actually being built and what it costs. The reactive camp is right that the language has gotten ahead of the architecture.

Complexity has consequences. Every always-on system is also a system that can be silently not on. The number of incidents I've seen that came down to "the heartbeat stopped firing and nobody noticed for six days" is not small. A reactive agent has one failure mode — it doesn't respond when called. An always-on agent has dozens — it slept, it crashed, it ran but did nothing, it ran but didn't log, it logged but didn't classify. The cost of "presence" is not just compute. It's debuggability.

The work doesn't need it. Most of what agents do today is bursty work. Long lulls, sharp spikes. Code reviews. Migrations. Customer service queues. The match between "always-on" and "agent" is far less natural than the always-on camp suggests. Right-sizing the agent's activation to the actual shape of the work is engineering. Continuous attention for bursty work is just expensive idle.

The cleanest version of Reactive is: the request-response paradigm isn't a limitation we should embarrass ourselves about. It's the honest contract. If your work has a shape that genuinely demands presence, build it explicitly. Most work doesn't, and pretending it does is how you light money on fire while introducing failure modes you can't see.

Where I actually land

I don't pick a side. I pick the architecture that admits both are right about something.

The biological analogy isn't an analogy. It's the answer.

In a human, the brainstem handles the autonomic functions twenty-four hours a day — breathing, heartbeat, temperature, the reflexes that fire before the cortex even notices the world has changed. It runs cheap, it runs continuously, it doesn't deliberate. The prefrontal cortex — the deliberative part — engages only when the brainstem hands it something worth thinking about. We don't burn high-energy cognition continuously, because evolution couldn't afford it. We burn it on demand, on signals the cheaper layer has already pre-classified as "worth your attention."

That's the architecture. And it's the one a serious agent system has to converge to:

  • A lightweight nervous system — a non-LLM layer — runs continuously. It listens to event streams, classifies signals cheaply, holds short-term state, and decides which signals are worth waking the expensive brain for. It's a router with memory. Polling, where it has to exist, lives here, not in the LLM.

  • The expensive brain — the LLM — sleeps until the nervous system says this deserves attention. When it wakes, it has full context, hot caches if you've architected them, and a single decision to make. Then it sleeps again.

  • The two layers have different SLAs, different costs, different observability. The nervous system has to be cheap, reliable, and constantly watched. The brain has to be smart, accountable, and audited.

Once you build this, the debate dissolves. The reactive camp is right that you should not burn LLM cycles when nothing's happening — so don't. The always-on camp is right that something has to be watching — so build the nervous system separately, in a layer that's cheap to run.

This isn't a compromise. It's the architecture both camps were edging toward without saying so. The reactive camp builds it accidentally every time they add a webhook router in front of their agent. The always-on camp builds it accidentally every time they add a "is this even worth processing?" pre-classifier inside the heartbeat.

The mistake is doing both layers with the LLM. Polling the world with the most expensive component you own is how you light money on fire. Polling the world with no component at all is how you miss the events that mattered.

This is what I mean by don't pick a side. The right answer isn't a tribe. It's an architecture — two layers, two SLAs, two budgets — and a rule for which signals deserve to wake the brain.

What I'd ask any team to write down

If you ship anything serious that uses agents, this is the version of the discussion worth having:

  • Where does your nervous system live? Not the agent. The cheap layer that decides which events get to the agent. If you can't point to it, you don't have one — and your agent is either expensive-polling or quietly missing things.

  • What signals are worth waking the brain? A pre-classification rubric, in writing. Updated when you learn. Most teams treat this as implicit, which is how they end up paying the brain to discard 99% of the events it sees.

  • What's the cost of not responding? Per signal class. If the cost of missing a signal is low, polling at low frequency or being purely reactive is fine. If the cost is high, you owe yourself the nervous-system layer.

  • What happens when the nervous system fails silently? Define it. The single most common always-on failure mode is the heartbeat that stops firing and nobody notices. The monitor needs a monitor. Eventually you stop and admit you've built infrastructure.

  • What's the budget split between always-on and on-demand? In dollars. Per month. If you can't say it, you don't know what your architecture is doing. If you can, you've already had the conversation that matters.

These are not philosophy questions. They have answers. They affect your token bill, your incident graph, the things your agent silently misses, and the trust you can credibly extend to it.

Most teams haven't answered any of them. Most teams have, instead, picked a side and started bolting on the other side's pattern under a different name.

A small confession to close

The reason I keep coming back to the brainstem-and-cortex picture is that I built the lazy version first. I gave Luke a heartbeat. It fired every thirty minutes. It read the world, decided there was nothing to do, and went back to sleep. Useful? Sometimes. Efficient? Not even close.

When I finally separated the two layers — a tiny, cheap process that watches for the events that actually deserve attention, and a thinking layer that sleeps until called — Luke stopped feeling like a delivery boy who shows up at noon every day. He started feeling like he was there. Cheaper, sharper, less anxious — and weirdly, more present, exactly because the part of him doing the noticing was no longer the part of him doing the thinking.

That's the part of this debate that doesn't fit in a steelman. Presence is not always-on. It's being there in the cheap layer and showing up in the expensive layer when you should. We are built this way for a reason. Our agents should be too.

This is the third issue of Don't Pick a Side. The next one is about Workflow vs Agent — when "agentic" is the right architecture, when it's an expensive way to write a graph, and the third position most production teams are quietly converging on. If you build with agents and want the next post when it lands, subscribe — and tell me which tribal debate is annoying you most right now.

Keep reading