Express course · No. 10

An agent is a language model put in a loop: reason, act, observe, repeat — until a goal is met. That loop is astonishing and, left alone, works only about half the time. The skill isn't a smarter model. It's engineering the chain — the steps, the context, the tools, the limits — so the loop is something you can trust.

Essence only · One picture per idea · Reliability over magic

§ 01

Before you build one, get the definition exact — because an agent is a specific, narrow thing, and most of the trouble comes from reaching for it when you didn't need it.

A model in a loop, not a single answer

A person fixing a leak: try something, look at what happened, adjust, try again — not one perfect move, but a cycle of action and feedback until it's done.

A plain LLM call takes input and returns text, once. An agent wraps the model in a loop: it reasons about the goal, takes an action (calls a tool), observes the result, and reasons again — with the model deciding each next step. That loop is what lets it handle open-ended tasks no single prompt could finish, like "investigate this bug and propose a fix."

Four parts: model, tools, memory, loop

A handyman is more than a brain — they have tools in the van, a notebook of what they've done, and the habit of working a job step by step.

An agent is four things together: a model to reason, tools to act on the world, memory to carry state across steps, and a loop that drives it until the goal is met or a limit stops it. Take away the loop and you have a chatbot; take away the tools and it can only talk. The power — and the risk — lives in the combination.

Autonomy is the whole point — and the whole danger

The difference between an assistant who asks before every move and one you hand a goal and walk away from. The second is far more useful and far easier to regret.

What makes an agent valuable is that it decides for itself how to reach a goal, across many steps, without you in the loop each time. That same autonomy is exactly what makes it risky: it can take a wrong turn, act on a bad assumption, and keep going. Everything else in this course is about getting the value of autonomy without paying full price for the danger.

Most tasks don't need one

You don't hire a general contractor to hang one picture. A screwdriver does it faster, cheaper, and with nothing to go wrong.

An agent is the heaviest tool in the box. A single prompt, structured output, or a fixed sequence of calls is cheaper, faster, and far more predictable. Reach for an agent only when the task genuinely needs many adaptive steps the path of which you can't script in advance. If you can write the steps down, write the steps down — don't make a loop guess them.

An agent is a model in a loop with tools and memory. The loop is the power, the autonomy is the risk, and most tasks need neither.

§ 02

Here's the fact that should shape everything: a typical production agent succeeds only a little more than half the time. Understand why, and you understand what you're actually building.

The demo works; the job is the rest

A car that starts perfectly in the showroom tells you nothing about whether it survives a winter of real driving. The showroom is the easy part.

In a 2026 study across thousands of production agents, the aggregate success rate was about 57% — and benchmark scores ran roughly a third higher than real-world results. The demo runs the happy path: clean input, a task it was shaped for. Production is everything the demo filtered out. The demo is free; the reliability is the entire job.

Chains multiply, so small errors compound

A relay race where each runner is 97% reliable — but with twenty handoffs, the baton hits the ground more often than not.

Agents work in long chains — a real task can be twenty dependent steps. Reliability multiplies down a chain: twenty steps at 97% each lands you near 55%. No single "stupid" mistake is needed; the arithmetic alone drags you to a coin flip. The first lever on reliability is making the chain shorter.

The worst errors hide behind a clean answer

A report that's beautifully formatted and confidently wrong — every heading in place, one key number quietly incorrect three pages back.

In a multi-step task, an intermediate mistake can pass a final-output check while corrupting the result — a research agent retrieves the right source, misattributes one fact in step three, and writes a clean summary that's wrong. The final answer looked fine; the middle was broken. This is the failure mode you'll catch least and pay for most.

Reliability is engineered, not prompted

A bridge doesn't hold because the steel is clever. It holds because someone engineered the loads, the joints, and the margins.

You don't fix the 57% with a better prompt or a smarter model. You engineer it: fewer steps, the right context at each one, scoped tools, checks on the intermediate results, and limits on what a wrong turn can do. The model is the raw material; the reliability is what you build around it. That work is the rest of this course.

A typical agent works about half the time. That number isn't a model problem — it's an engineering problem, and it's yours to solve.

§ 03

An agent's working memory is just the text in the window, and across a long task that window fills with noise. Managing it — not enlarging it — is what keeps an agent coherent.

The window is the only working memory

A brilliant investigator who forgets everything each morning and can only act on the notes pinned to the board in front of them.

Between steps, the model remembers nothing except what's in the context window right now. The running goal, what's been tried, the last tool's output — if it's not in the window, it's gone. So an agent's competence at step ten is decided by what you chose to keep in front of it, not by how smart the model is.

Drift is what actually kills long tasks

A game of telephone down a long line — by the end, the message has quietly mutated, and everyone is confidently repeating something slightly wrong.

The dominant failure in long-running agents isn't a dumb model — it's context drift: across many steps the window accumulates stale facts, half-finished reasoning, and contradictions, and the original goal slides out of focus. A bigger window makes this worse, not better — it gives the drift more room. The fix is hygiene, not capacity.

Curate and compress as you go

A good editor doesn't add pages — they cut. At each step they hand you the one page that matters now, not the growing stack of everything so far.

Reliable agents actively manage the window: keep only what this step needs, summarise finished steps into a short running state instead of dragging the full transcript forward, and re-anchor the goal each turn. The model carries the conclusion, not the raw history. Relevance beats completeness, every step.

Memory lives outside the window

A long project needs a filing cabinet, not a bigger desk — you store what matters elsewhere and pull back only the folder you need right now.

Because the window is finite, durable memory lives outside it: notes, summaries, and retrievable history in a store, fetched back in when relevant (the same retrieval idea as RAG). This is what lets an agent work across a task too big to hold at once — and what stops it forgetting the beginning by the end.

An agent is only as coherent as its window. The biggest context window doesn't win — the best-managed one does.

§ 04

Tools are how an agent reaches past text to actually do things. They're also where it gets hands — which is exactly why they need to be scoped, described, and trusted carefully.

Tool use: the model asks, your code acts

A smart assistant who can't open the filing cabinet themselves — but can tell you precisely which drawer and file to pull, and read what you bring back.

With tool use, you describe functions the model may request — search_orders, send_email, run_query — and when it wants one, it returns a structured call, your code runs it, and you hand the result back. The model decides what; your code controls doing it. This is how an LLM reaches beyond words into your systems and the world.

The tool description is a trust boundary

A new hire who follows the label on every box without question — so whoever writes the labels effectively controls what they do.

The model decides which tool to call largely from the tool's description — so a misleading or malicious description can steer it. Treat the set of tools and their descriptions as part of your security surface, not a convenience. An agent trusts what its tools tell it; make sure that trust is earned.

Least privilege: hand over the minimum

You give the house-sitter a key to the front door, not the safe, the car, and the bank account. Access scoped to the job.

Give an agent the narrowest tools that do the task. A research agent that only needs to read should not hold send, pay, or delete. Scoped tools shrink the blast radius when the loop takes a wrong turn — and it will. Capability you never granted can't be misused, by a mistake or by an attacker.

MCP and the plumbing underneath

New plumbing run through the whole house fast — and some of the valves were never fitted with a lock.

Agents reach tools through connectors — increasingly the Model Context Protocol (MCP). It standardises how an agent gets hands, which is powerful and now a real attack surface: a large share of remote MCP servers have shipped with no authentication at all. If a connector exposes actions, it needs auth, scoping, and an inventory — treat it like the door it is.

Tools give an agent real hands. Scope them to the task, distrust their descriptions, and lock the plumbing — capability without limits is a liability.

§ 05

The instinct is to build one clever agent that does everything. The reliable pattern is the opposite: break the work down, and let the simplest structure that fits each piece do the job.

One agent that does everything does nothing well

A single worker told to be the architect, the plumber, the electrician, and the inspector — versus a small crew of specialists who each do one thing properly.

A sprawling agent with twenty tools and a vague mandate has too many ways to go wrong at every step. A narrow agent with a clear job and a few tools is more reliable, easier to test, and easier to debug. Decompose the work into focused pieces — depth in one job beats breadth across ten.

Plan first, then execute

A good contractor draws up the plan before anyone lifts a hammer — and a cheap apprentice can do most of the hammering once the plan is clear.

A strong pattern is plan-and-execute: one capable model breaks the goal into concrete steps, then cheaper, narrower calls carry them out. This shortens the reasoning chain, makes the agent's intentions inspectable before it acts, and cuts cost — a strong model plans, small models execute.

Orchestration is the real architecture

An orchestra isn't a pile of brilliant soloists — it's the conductor and the score deciding who plays when. The coordination is the music.

Once you have several agents or steps, the value moves to orchestration: routing each subtask to the right handler, passing clean context between them, and handling errors when one fails. The model gets the attention, but the coordination layer — who runs when, with what context — is where a multi-step system actually succeeds or falls apart.

A fixed chain beats a loop when the path is known

On a route you drive every day, you don't re-plan it each morning — you follow the known path. You only improvise when the road is actually blocked.

If the steps are predictable, wire them as a fixed chain — call A, then B, then C — not an agent that re-decides the obvious every time. Reserve the adaptive loop for the genuinely open-ended parts. The most reliable systems are mostly fixed pipeline with a small island of agency where it's truly needed.

Don't build one agent that does everything. Decompose the work, fix the predictable parts, and keep the loop small where it earns its place.

§ 06

An autonomous loop will eventually do something you didn't intend. Control isn't about preventing every wrong turn — it's about making sure a wrong turn can't cause real harm.

Bound the loop: steps, time, cost

A taxi meter with a hard ceiling — at a set limit the ride stops, so a wrong turn can't run up an unlimited fare.

The first leash is hard limits: a maximum number of steps, a time budget, a spend cap. Without them an agent can loop forever, repeat itself, or quietly burn a fortune chasing a goal it can't reach. Limits don't make it smart — they make its failures bounded and survivable.

A human gate on the irreversible

A bank lets a clerk look up anything, but a large transfer needs a second person to sign. Reading is free; consequences get a checkpoint.

Anything the agent can't take back — sending, paying, deleting, publishing, shipping code — gets a human approval or a hard validation in the path. The injected instruction or bad decision can propose the action, but can't complete it alone. Put the checkpoint at the consequence, not at the very end.

From approving every step to watching the flagged ones

A manager who signs off on every email never scales — the one who sets clear policy and only steps in on exceptions does.

You don't have to choose between rubber-stamping each step and blind autonomy. The mature pattern is human-on-the-loop: the agent runs, you set the policy and watch, and it escalates only the uncertain or risky moments to you. As trust grows, you approve less and monitor more — but you never stop watching.

Log everything, because you'll need to explain it

A flight recorder runs the whole journey — not for the flights that go fine, but for the one where you need to know exactly what happened and why.

An agent makes decisions you didn't see. So record them: every step, tool call, input, and output, kept as an audit trail. When something goes wrong — and with autonomy, something will — the log is the difference between learning what happened and guessing. It's also increasingly what a regulator or a customer will ask to see.

You can't stop an autonomous loop from ever being wrong. You can bound it, gate the irreversible, watch it, and log it — so wrong is survivable.

§ 07

The gap between a demo and a product is measurement. You can't trust an agent you haven't tested at the step level — and you can't improve one you're tuning by feel.

Evaluate the steps, not just the answer

Grading a maths exam by the final number alone passes the student who got the right answer through two cancelling mistakes — and you learn nothing about what they actually know.

For agents, a final-output check is not enough — it waves through the broken middle. Step-level evals grade the intermediate reasoning, the tool calls, and the retrievals: did it pick the right tool, pass the right arguments, use the right facts? You measure the trajectory, not only the destination. (The Evals course goes deep on this.)

Climb the ladder only as far as you must

You don't book a moving truck to carry one box across the room. You scale the machinery to the job, not above it.

There's a ladder: a single prompt, then structured output, then RAG for facts, then tools, then a full agent loop. Each rung adds power, cost, and new ways to fail. Start at the bottom and climb only when the problem forces you. Most features never need the top rung — and an agent where a chain would do is the most common over-build there is.

The agent is a component, not the architecture

A car has a powerful engine, but it sits behind a firewall, fed clean fuel, wrapped in brakes. The engine isn't the car.

Put the agent behind an interface, with validation on what goes in and what comes out, and the freedom to swap models or restructure the loop. The agent is one powerful, unreliable part of your system — not its foundation. Build the system so you could replace the agent without tearing everything down.

Before you ship an agent

Does this need a loop at all — or would a prompt, a chain, or a single tool call do it more safely? - How short can the chain be — every step you remove is reliability you gain. - What's in the window at each step, and what gets summarised or dropped? - Are the tools scoped to the minimum the task needs? - What are the limits — max steps, time, spend — and the gate on irreversible actions? - What's the step-level eval that tells you it actually works, not just demos?

Smell tests that you over-built

An agent loop for a task whose steps you could just write down. - One do-everything agent with a dozen tools and a vague goal. - Tuning the agent by vibes, with no eval at the step level. - No limits on steps or spend, and no human gate on the risky actions. - Dragging the entire history into the window on every step.

Signs you built it well

The work is decomposed — narrow agents and fixed chains, a small loop only where needed. - The window is curated and compressed, not a growing dump. - Tools are least-privilege, and the connectors are authenticated. - There are hard limits and a human gate on anything irreversible. - You have step-level evals and a full audit log, and the agent sits behind an interface you could swap.

Building agents isn't summoning autonomy. It's the ordinary engineering — short chains, clean context, scoped tools, limits, and evals — that turns a 57% loop into something you can trust.