Every AI agent is, at heart, a loop. The model thinks, calls a tool, reads the result, and thinks again, over and over, until it has an answer for you. The loop is simple to draw on a whiteboard. Making one you can actually run in production, in front of customers, against their real data, turns out to be a much longer story.
At C1 we've rewritten that loop three times. Each rewrite kept the same basic shape and threw out the parts that didn't survive contact with reality. This is the story of those three generations: what we got wrong, what we learned, and why the agent you talk to today looks almost nothing like the first one, and does far more than that first one ever could.
The loop, in one picture#
Before the history, here's the thing all three generations have in common. An agent loop looks like this:
The model decides; tools do; the loop carries results back so the model can decide again. Everything that follows is about making each of those arrows trustworthy: knowing when the loop is done, and making sure a tool only does what the person asking is actually allowed to do.
Generation 1: v1 gets it working#
Our first agent, v1, was a focused assistant for one job: triaging access-request tasks in a service desk. And it did real work: it autonomously approved, denied, and reassigned live access requests in customers' production environments. From its guts we generalized a reusable framework, and the first version of our agent engine was born.
In this first generation, every type of agent was hand-built. And it wasn't really a single in-process loop yet: each agent was a state machine driven by a durable workflow engine. A step made one call to the underlying AI model, the model picked one tool, the tool ran, and its result fed the next step. The workflow engine moved the agent from state to state. The machine's terminal states (approve, deny, reassign) were how it knew it was done.
Concretely, v1 wasn't allowed to just react. When an access-request task came in, the prompt forced it to gather context first: the entitlement and its risk, the governing policy, related insights, and the requester's profile. Only then could it choose exactly one of four actions: approve, deny, reassign, or move the request onto a more appropriate policy. It leaned deliberately toward reassigning (to app owners for small gaps, managers for bigger ones) rather than denying when information was thin. Which of those four actions it was even offered depended on a configured mode. A cautious customer could limit it to comment-only, or change-policy-only. Even our first draft constrained the agent by construction.
But it was bespoke. Every new kind of agent meant another hand-written state machine and another durable workflow to wire up and maintain. The machinery that made the access-request agent reliable couldn't be reused for the next agent without largely rebuilding it.
That one-agent-at-a-time design was the headline problem, but Generation 1 had a whole list of traits we wanted to fix before building more on top of it:
- One tool at a time, with no parallelism.
- No accounting of how many tokens (and therefore dollars) a conversation burned.
- No fallback if the model had a bad moment.
- Every agent reimplemented its own orchestration, slightly differently each time.
We had a capable agent. Now we needed an engine that could power many of them.
Generation 2: one engine to drive them all#
The second generation made one foundational architectural move: instead of every agent owning its own orchestration, we built a single engine that drives every agent. An agent stopped being a chunk of bespoke code and became a simple description: here's my personality, here's my model, here's my toolbox. The shared engine runs the loop for all of them.
And because everything now ran through one engine, we had a single place to fix everything else:
Knowing when it's done, for real this time. The first version of the shared engine still guessed: if the model went quiet without asking for a tool, the engine nudged it to keep going, and after three quiet turns in a row it bailed out with the honest error "agent is stuck in a loop." We replaced that heuristic with a terminal tool. The loop now ends only when the model calls a specific tool that means "I'm ready to answer the user" (literally named respond_to_user). If the model just trails off without calling it, that's treated as a bug, not a finish line.
Doing several things at once. Real questions often need several lookups. The new engine runs all the tool calls from a single turn in parallel instead of one-by-one.
Surviving a bad model day. We hardened the engine against flaky models. If a response hit its length limit, the engine automatically doubled the token budget (up to a cap) and retried instead of giving up. And we wrapped every model call in a circuit breaker with an automatic fallback: after a handful of consecutive failures it trips for a few minutes and the engine retries on a backup model instead of failing the whole conversation.
Counting the cost. Every model call now accumulates token usage and latency across the whole conversation, for billing and observability.
Generation 2 put every agent on one hardened engine. But it still carried two limits we'd outgrow. First, the agents were a fleet: a router plus a dozen-plus specialists (policy, access requests, automations, functions…), each its own full agent with its own plumbing, with a router agent that classified each incoming message and dispatched it to the right specialist. Second, every tool was hand-written code, compiled into the service. Adding a capability meant shipping code.
Generation 3: c1aw brings one governed loop#
The third generation is called c1aw (short for C1 Autonomous Worker). It rethinks the agent from two directions at once.
From a fleet of agents to one loop with "skills"#
Generation 2's dozen specialist agents each carried their own copy of the loop machinery. c1aw collapses them into one agent loop with pluggable skills. A skill isn't a whole agent anymore; it's a system prompt plus an allowlist of tools, which the runtime then intersects with the asking user's permissions (so the final toolset is what this skill needs ∩ what this user can do). Functions, Automations, CEL, CSV, access reviews, role mining, and onboarding each became a skill plugged into the same loop, instead of a separate agent with its own wiring. The fleet didn't vanish entirely: a lightweight intent layer still classifies each message and narrows the tool list. But it picks tools, not whole agents. One loop to maintain, harden, and observe, instead of a dozen.
Tools that are dynamic and governed#
This is the biggest leap. In the first two generations, tools were hardcoded and compiled in. c1aw instead pulls its tools from a live tool catalog and, crucially, scopes them to the person asking. The agent runs under the requesting user's identity, so it can only do what that user is allowed to do. Every tool call is access-gated (allowed / needs-an-access-request / denied) and audited.
c1aw's toolbox actually spans several layers:
- C1-derived tools: the entire
c1.api.*surface, auto-generated from our API definitions and exposed asc1_*tools (search grants, list apps, task actions, and so on). The same catalog that powers our external tool gateway. - Connector tools: calls into the customer's actually-connected systems, routed through per-user governance so a tool call respects real entitlements.
- Code Mode: rather than stuffing hundreds of tools into the prompt, the agent gets a few meta-tools: one to discover tools, one to execute a small program that calls them, and one to poll a long-running execution. The program runs in a locked-down sandbox: no network, no subprocess, no filesystem escape, a 30-second timeout. One entry point unlocks the whole dynamic surface, on demand.
- Channels: the same loop serves the user wherever they are: the web app or Slack. A channel bridge sits between the loop and each surface, so the agent emits logical content: streamed text, a transient "running search…" status, an error, a rich interactive panel. The bridge decides how to render it for that platform. Slack gets native streaming and message splitting; the web app gets the same stream plus rich UI surfaces that Slack quietly skips. The loop doesn't know or care which channel it's talking to.
- Memory: a genuine memory subsystem. c1aw extracts durable facts from conversations (things you told it, things it learned), stores them with vector embeddings, and recalls the relevant ones later by similarity. It also consolidates near-duplicate memories and supersedes outdated ones, so memory stays clean instead of just accumulating. Earlier generations weren't memory-less, to be clear: from the very first agent, v1 stored task summaries and decisions as vector embeddings and recalled them by similarity, injecting a cached summary back into the prompt. What c1aw adds is the leap from per-task summaries to durable, cross-conversation personal (and group) facts, an explicit recall step, and active maintenance.
The pattern, in hindsight#
Line the three generations up and a single arc appears:
| Gen 1 (v1) | Gen 2 (one engine) | Gen 3 (c1aw) | |
|---|---|---|---|
| Who owns the loop | each agent: a hand-written state machine | one shared engine | one loop + pluggable skills |
| Knowing it's done | explicit state-machine terminal states | a terminal tool (after an early three-strikes phase) | a terminal tool |
| Tools | hardcoded, compiled in | hardcoded, compiled in | dynamic, from a live catalog |
| Per-user permissions | coarse: allowed-action pre-check + agent modes | agent carries an identity | dynamic identity scoping, three-way gate, audited |
| Memory | task summaries, vector-recalled | task summaries, vector-recalled | durable personal facts, vector-recalled, consolidated |
| Channels | service-desk tasks | web chat | web + Slack, via a channel bridge |
| Interoperability | none | none | connector tools + an external tool gateway, per-user governed |
| Code Mode | none | none | discover + execute tools in a sandbox |
Every generation moved hard problems out of individual agents and into shared infrastructure. Gen 1 put an autonomous agent into customers' production environments. Gen 2 unified a fleet of them behind one hardened engine. Gen 3 collapsed a fleet of agents into a single governed loop, made its tools dynamic instead of compiled-in, scoped every action to the person asking, and gave it a memory.
The whiteboard drawing never changed: think, act, repeat, answer. Everything hard was in making each of those arrows safe enough to point at a customer's production environment. Three rewrites later, that's the part we're proudest of.
None of this was the work of any one person. Every generation, from the first hand-built agent to the shared engine to c1aw, is the product of our entire engineering team at C1: the people who designed the loops, wrote the tools, hardened the failure paths, reviewed the pull requests, and carried the pager while it all ran against real customer data. Thank you, all of you.