← All entries

Agents Are Employees, Not Features.

Notes from a long winter at ClipForge, in which I learned the difference between calling an API and managing one.

I was hiring my first agent in November, although I did not, at the time, know I was hiring. I thought I was writing a function. I had a YAML file open, and a prompt, and a small, sour cup of coffee that had been brought to me with great solemnity by a barista who had, against the odds, spelled my name correctly. I typed, in the manner of all software engineers since the world was made: def summarise(transcript: str) -> str: — and beneath it, in the docstring, I described, in pleasant English, the sort of summary I had in mind.

The function returned a summary. It was a serviceable summary. I closed the laptop and went home, feeling, on the whole, that the matter was settled.

It was not. It was, in fact, the beginning of a long, slow, occasionally humiliating realisation that I had not written a function at all. I had hired somebody. The somebody was, admittedly, a slightly odd somebody — a brilliant young person of unstable temperament, with no memory beyond the last conversation and a tendency, when overworked, to invent facts about French history. But it was a somebody. And what I had failed to do, on that first afternoon, was the first thing one does with a somebody, which is, of course, to manage them.

The temptation of the function call.

Engineers, of whom I count myself one in the slightly fraudulent way that ex-pianists count themselves musicians, are trained to think in functions. A function is a beautiful object. It takes an input. It returns an output. It does not have moods. It does not have, on a Wednesday in January, a strong personal opinion about the political situation in Brazil. It does not, when you ask it the same question twice, give you two different answers, both of which are plausible, neither of which is true.

An agent does all of these things. An agent — by which I mean any sufficiently capable language model placed inside a loop and given access to tools — has, for all practical purposes, the working temperament of a junior employee on her third week at the firm. She is keen. She is competent in flashes. She is, in the small hours of the morning, capable of producing work which makes you wonder, briefly, whether you are about to be replaced. She is also, in equal measure, capable of cheerfully describing, in the same paragraph, two mutually contradictory states of the world.

If you treat her as a function, you will be furious all the time. If you treat her as an employee, you will not.

The first lesson, which is the hardest.

The first lesson, at ClipForge, was that the agents had to have job titles. This sounds, I admit, faintly ridiculous. I had been resisting it for some weeks on grounds of dignity. We held a small meeting in our kitchen — myself and Theo, my co-founder, who is the sort of man who eats apples whole, including the core, with a slightly defiant air — and at the end of the meeting we had, against my better instincts, written six job titles on a whiteboard. Transcript editor. Caption stylist. Thumbnail art director. Hooks specialist. Compliance reviewer. Customer-success liaison.

I thought, that evening, that we had over-engineered the thing. I went to bed faintly embarrassed.

The next morning, the system worked. Not better — that came later — but visibly differently. By giving each agent a title, we had given each agent a scope, and by giving each agent a scope, we had given ourselves a place to look when something went wrong. The thumbnails had become flat and ugly. We did not, as we had previously, mutter darkly about the model. We muttered darkly about the thumbnail art director, who had clearly, over the past forty-eight hours, developed an unexamined fondness for the colour teal.

You cannot fire a function. You can, with some satisfaction, fire an art director.

What an agent's onboarding actually looks like.

I have come, since November, to think of the system prompt — the long, fastidious document that sits at the top of every agent's context — as a sort of induction handbook. It is the thing the agent reads on its first morning, in the small kitchen, with the bad coffee, before anyone has shown it where the bathrooms are. If the handbook is good, the agent does, from day one, broadly the right thing. If the handbook is poor — if it is vague, or contradictory, or written in the language of a man who has not himself recently done the job — the agent will, with a kind of polite ferocity, do the wrong thing for as long as you continue to pay it.

The induction handbooks at ClipForge are, at this point, longer than my university dissertation. They contain, among other things: the precise tone in which our customer-success liaison is to apologise to a furious YouTuber whose video has been mis-captioned; the exhaustive list of brands our compliance reviewer is permitted to mention by name; a paragraph, written by Theo with great solemnity, instructing the thumbnail art director that the colour teal is, on the whole, to be considered a last resort. None of this is glamorous. All of it is the work.

The second lesson, which I am still learning.

The second lesson is that agents, like junior employees, have to be reviewed. Not at scale, in aggregate, with dashboards. Individually. By a person. Over a coffee.

I sit down, every Friday afternoon, with a particular cup of tea and a particular biscuit, and I read — across the previous week — a sample of fifty conversations from each agent. It takes about three hours. It is, I will admit, not the highest-leverage activity I could be doing on a Friday afternoon. It is the most important. I find, every single week, something I had not previously known about the way the agents are, in fact, behaving in the wild. The customer-success liaison has begun, without anyone asking, to sign her e-mails with a small, slightly twee warmly. The hooks specialist has developed an unexamined fondness for the word frankly, which has begun to appear, with mounting confidence, in approximately one in three of his outputs.

I correct the handbook. I close the laptop. I drink, on the whole, more tea.

What this changes about software.

What I am, in a roundabout way, trying to say is that the architectural shift — the one nobody quite has the language for, yet — is not from using AI to using more AI. It is from functions to colleagues. From calling the model to employing it. It is a shift in the unspoken metaphor, and the metaphor matters because the metaphor decides, in advance, which problems are visible and which are not.

If you think of the agent as a function, you will spend your evenings tuning hyperparameters and being puzzled by drift. If you think of the agent as an employee, you will spend your evenings rewriting the handbook, and you will be puzzled by drift much less, because drift is, after all, what employees do. They learn small bad habits. They develop fondnesses for words. They sign their e-mails warmly. They are, in their oddly persuasive way, alive.

I am, as it happens, hiring. Two more agents this month. Their job titles are written on the whiteboard in my kitchen. The induction handbooks are nearly finished. The teal, I am pleased to report, has been spoken about firmly, and the thumbnail art director has, this week at least, behaved.

— B.