Why is more AI Agent not equal to higher productivity

2026/06/01 01:01
🌐en

Design your attention like a design system

Why is more AI Agent not equal to higher productivity
Original title: The Organization Tax
Original by:Addy Osmani
Photo by Peggy

Editor: As AI Agent becomes cheaper and easier to access, software development is entering a new phase: The question is no longer whether more Agent can be activated, but whether human beings have enough attention to manage, judge and consolidate their output。

The article presents an inspiring concept — “formulation tax”. The cost of starting Agent is very low, and only a Prompt or a click; but what is really expensive is a follow-up: checking whether the results are correct, understanding its impact on the architecture of the system, dealing with conflicts between different Agents, and ultimately deciding which codes can enter the main branch. These efforts cannot be reduced to a simple parallel, but they still have to go back to the same serial resource: human judgement。

The author compares the developers to the "GIL" in the AI Agent system, the one-way lock that limits the final throughput of the co-production system. Multiple Agents can operate simultaneously, but as long as they enter the stage of architecture judgement, code review and conflict consolidation, they must be re-engineered by the developer ' s brain. As a result, the greater the number of Agents, the higher the output, the longer the tasks to be reviewed, the more frequent context switching and cognitive fatigue for developers。

This is also a point that is easily overlooked in the current wave of AI programming tools: efficiency and real productivity are not always the same. A fully-screened Agent dashboard creates the illusion of “high-yielding”; but if developers do not really understand, review and consolidate these changes, the system may eventually accumulate not in productivity, but in technical and cognitive debt。

So the real discussion here is not "how to use more Agent" but "how to redesign the workflow around people's attention." In the Age of Agent, key capabilities are not just asking questions, assigning tasks, but knowing which tasks can be left to the machine in parallel, and which tasks must be left to human judgment; when a review should be made and when should it be discontinued and refocused on a core issue。

AI is expanding its co-production capacity, but human attention remains the most scarce and non-replicable resource in the system. The truly mature Agent workflow, instead of throwing all its tasks to the machine, carefully designed its own attention structure like a design production system。

The following is the original text:

Now, starting more AI Agent has become easy. But more Agent is running at the same time, which doesn't mean that you've changed. Your cognitive bandwidth cannot be synchronized. All the judgement that really guides them, judges their results, merges their modifications, must ultimately still pass through the same serial processor — that is, yourself。

The so-called "formulation tax" is essentially the price you pay when you forget it. And the only real solution is to start designing your own attention, like any cogeneration system。

I participated in a round-table discussion on Google I/O with Richard Seroter, Aja Hammerly, Ciera Jaspan on how the software works now look and how it might evolve. Towards the end, Richard asked us, "What's the most important thing to take away and change after the developers have heard?"

I say one thing that has been going on and on over the past few months: feeling busy is never the same as having real output. You can run 20 Agents at the same time and feel busy. But that doesn't mean you've delivered 20 Agent's corresponding workload。

Earlier in that conversation, Richard gave this question a name. He said, ‘What you were talking about was the setting of taxes.’ You can't manage 20 Agents in your head

He was absolutely right. I would like to take this concept more completely apart, because it is not a matter of self-regulation, but of architecture。

There was a phrase in that round table that I almost said, and it was always in my mind: running multiple Agents doesn't mean there's one more you in the world。

People don't count asymmetrical

There is a hidden asymmetrical in the Agent workflow。

Start an Agent very cheap. You just have to knock on the keyboard or write a Prompt. But it's not cheap to finish Agent's ring. It has to be checked if it returns correctly and re-coordinates it with other Agent changes。

This man is you. And you only have one。

Last month, I wrote part of this issue in Your Parallel Agent Upper limit, focusing on the kind of environmental anxiety that you don't know which parallel thread is silently failing. The article is about the structure behind this cost。

And when you start looking at Agent as a cogeneration system, you realize that humans are just a component of the system. A slow serial component。

You're the one-way resource

If you've written the simultaneous code, you've had the instinct to understand it. It's just that you used to use that instinct in the wrong place。

Python has a global interpreter lock, which is GIL. You can create any multiple threads, but at the same time there's only one thread to execute Python bytes because they have to get this lock first。

You're your AI Agent's Gil。

They can all run simultaneously. But as long as their work requires a genuine understanding of the architecture of the system or a resolution of the consolidation conflict, the lock must be taken. And there's only one lock left by you。

The law of Amdal makes this matter very precise: the acceleration cap resulting from parallelization depends on the part of the work that still has to be done in tandem. If there's a lot of things you can't do in parallel, no matter how much you put into the core, you'll end up with a hard ceiling。

In Agent's development, this serial part is judgment。

Starts 8 Agents do not speed up your judgment time. It will only make the queue waiting for you longer。

This is an old fact in performance engineering, but many people are still surprised that it optimizes the non-bottleneck and does not increase overall throughput. You just pile up more unfinished work ahead of the bottlenecks。

Adding Agent to optimisation is that part that is not bound by it. The real constraint is the review link, and the system's throughput is exactly the same amount of throughput。

Taxes are the structural gap between Agent's productive capacity and what you can actually merge. It happens when you get a one-way resource to manage a cogeneration system。

It won't solve the structural ceiling

And on that table I said: I have never felt so efficient in my means, but I have never been so tired。

Both feelings are completely real and they come from the same reason。

There's a very specific source of this fatigue: it's the feeling of continuing to press a serial processor to 100%, without giving any extra time。

Every time you look back at an Agent that's out of your mind, you pay for a context switch. You have to clear your brain and reload another language from scratch。

CPU CAN DO THIS IN MICROSECONDS, EVEN SO THE ARCHITECT WILL TRY TO AVOID FREQUENT SWITCHING. AND IT WILL TAKE YOU A FEW MINUTES TO FINISH, AND YOU WILL NEVER BE ABLE TO PERFECT THE CONTEXT。

5 Agents are not double the workload five times. It's a five-time cold-started context reload, plus a brain process that continues in the backstage, and you're constantly worried about which Agent you should check now。

You can't solve a structural constraint with "more effort". This tax is always paid。

If you try to push it, it will end up in another form: either the code becomes shallower, or you enter a state of "cognitive surrender" -- because the formation of your own judgment is so tedious that you simply accept the code written by Agent。

Either you pay the tax on your own initiative or let it slowly destroy your understanding of your system in the dark。

Design your attention like a design system

So, you have to treat your attention as a scarce resources of collusion。

You're not going to look at bottlenecks when you design a distributed system. Well, give your brain the same respect。

Here are some of the methods that really work for me:

Expands the Agent team according to access capabilities, not according to UI capabilities。

A good cogeneration system uses counter-pressure mechanisms to avoid endless queue growth. Producers need to slow down in order to match consumer capacity。

Your Agent number is the producer, your ability is the consumer. The correct number of parallel Agents should be the number of numbers you can seriously complete the code review. For most people, this is usually a very low number。

AI tools would certainly be happy to get you to start 20 Agents, but that's just an UI function, not to say you really have the ability to manage them。

To task classification。

When Richard asked me how to handle this, I mentioned that. I'll split the mission into two piles。

The first is relatively independent work, and I'm willing to hand it over to Agent, which runs in the back of the cloud. These tasks can be carried out differently, usually only once at the last gate。

the second is complex tasks, and the real work itself is judgement. like a weird bug, or a architecture design。

The biggest mistake was to attempt to parallel the second category of tasks. Dealing with multiple complex tasks in parallel will not expand your output, but will only allow the lock to be fought over over and over again, and eventually all results will deteriorate。

batch review。

Every context switch will cost you a lot. The result of sitting down once and for all is much cheaper than looking at one, doing something else, starting it cold and watching the other。

Give Agen a longer towed rope. Let the work accumulate a little and then treat it as a batch。

Only use this lock for judgment。

Don't waste your brain on something a machine can prove to itself. Let Agent write tests that pass, or generate screenshots。

Let them prove themselves that 80% of the dry but verifiable part. So, your scarce attention just needs to be spent on 20 percent of what really needs human judgment。

Protect your serial time。

Bottlenecks take your best time, not the rest of the debris between several Agent checks。

Sometimes, the most powerful move is to shut it down completely: to turn off the computer that was stuffed with Agent, to focus on one problem and to hold the lock firmly throughout the process。

Organization is not really a job. It's just a cost around work。

Aja pointed out that architecture capacity is now the most urgent skill: you need to know what's right for an Agent and what's too big for it。

I'd like to add that you're part of the system. You're paying attention to a known, low volume of swarms. The system either respects that number or will bypass it by quietly lowering your standards。

Being busy doesn't mean being productive

This is very important because this pattern of failure is almost invisible to you。

Twenty running Agent will give you a sense of productivity. The dashboard is full, everything is moving. But there is a decoupling between this feeling and the actual consolidation of high-quality codes into the main branch。

You can get to the limit, but there is little real output. Both are almost identical in terms of internal experience。

Ciera refers to Margaret-Anne Storey's debt study. We talked about the technical debt and about the cognitive debt。

Failure to pay a structured tax would allow you to accumulate both types of debt。

You merged something you didn't really read. Your mind model for the code library is completely out of date. These questions will not be on the dashboard today. And they're going to show up when the production environment breaks down -- and then you look at the system and you suddenly realize you don't know how it works。

So, the real conclusion is that starting Agen is not the ability. Anyone can run 20。

The real capability was to design the system around a serial resource that could not be cloned and could not be synchronized。

This resource is your focus。

Design it like any critical component that depends on in the production environment。

[ Chuckles ]Original Link]

QQlink

No crypto backdoors, no compromises. A decentralized social and financial platform based on blockchain technology, returning privacy and freedom to users.

© 2024 QQlink R&D Team. All Rights Reserved.