
Claude Code v2.1.172 Unlocks Recursive Sub-Agents. My Fleet Found Three Walls.
Recursive sub-agents are a real upgrade, and after weeks of running CLI agent fleets in tmux, I can tell you exactly where the orchestration breaks.
Fixed foreground subagents spawning unbounded nested chains; they now respect the same 5-level depth limit as background subagents
- Recursion is real, but it is not magic. v2.1.172 lets a sub-agent fan out its own sub-agents five levels deep, which means level two can quietly multiply your concurrency and your bill at the same …
- Cold prompt-cache is the first wall. Fire N agents cold at once and every one misses the shared prefix. Fire one, wait for its stream, then fan out, and the rest read the cache at roughly a tenth o…
- Bursty fanout drops sockets. A 7-way simultaneous launch left 6 of 7 dead with a closed-socket error after 200 to 500 seconds, with zero output files written. Staggering to 3 or 4 at a time landed …
- The supervisor is the silent cost. A fat orchestrator reading 100K-token panes re-bills its whole context every turn. Off the flagship meter is not the same as cheap.
For the past several weeks I have run seven coding agents at once on a single Linux box, each one a separate interactive CLI session, all supervised from one terminal. Not a benchmark, not a staged demo. Real work: drafting, refactoring, and fixing tests across a handful of repos while I watch a board that summarizes what each session is doing. That setup taught me what multi-agent orchestration actually buys you, and the three places it breaks every single time.
The news peg is concrete. On June 10, 2026, Claude Code shipped [email protected], and the line in the official changelog reads plainly: sub-agents can now spawn their own sub-agents, up to five levels deep. A restriction that had held since sub-agents first shipped quietly ended. The patches since then tell the real story. @2.1.181 on June 17 fixed foreground subagents spawning unbounded nested chains. @2.1.187 on June 23 fixed depth tracking so resumed and forked subagents count toward the cap. The newest build, @2.1.195, landed June 26. When a vendor ships a feature and then spends two weeks hardening the guardrail around it, that guardrail is load-bearing. The v2.1.172 release notes list the full set of changes that arrived alongside it.
Recursion is a genuine upgrade. A planner agent can break a job into pieces, hand each piece to a worker, and each worker can split its own piece again. The point is context hygiene: the noisy work happens in a sub-session the main conversation never has to read. I run exactly this shape in my setup. One supervisor on the flagship model holds the plan, the workers run on a cheaper volume model, and the supervisor only ever sees short summaries instead of full transcripts.
Now the part the launch posts skip. Here are the three walls.
Wall one: cold prompt-cache on parallel fanout
The single biggest cost lever in a fleet is the prompt cache, and the naive way to fan out destroys it. If you fire N agents cold at the same instant, every one of them writes the shared system prefix from scratch and pays full freight. The cache only helps the agents that arrive after that prefix is already hot. A simultaneous cold burst is the worst case, because no one warms it for anyone else.
The fix is a warm-up. Fire one agent first, wait for its stream to actually start, then release the rest. The first call hydrates the shared prefix; the followers read it at roughly a tenth of the input cost. This is not theory, it is the difference between a parallel run that is cheaper than serial and one that is far more expensive.
# WRONG: all cold at once, every agent misses the shared cache
fire agent-1 agent-2 agent-3 agent-4 agent-5 agent-6 agent-7
# RIGHT: warm one, then fan out
fire agent-1 && wait_for_stream agent-1 # hydrates the shared prefix
fire agent-2 agent-3 agent-4 # these read the cache at ~10%
# trust nothing: confirm the cache actually hit
# each agent's usage should show cache_read_input_tokens > 0
Recursion makes this worse, not better. A sub-agent that fans out its own children is a second fanout point, and if those children launch cold you have paid the cold-cache tax twice. Five levels deep, an un-warmed tree can write that prefix dozens of times before it does a minute of useful work.
Wall two: sockets drop on bursty fanout
This one cost me a full afternoon on June 15. I fired a 7-way simultaneous burst of worker sessions. Six of the seven came back with API Error: The socket connection was closed unexpectedly, 200 to 500 seconds in, after emitting somewhere between 15 and 200 tokens each. None of them wrote their output files. The orchestrator cheerfully reported six "completed" agents that had produced nothing on disk.
A high count of concurrent agent sockets saturates something upstream and the connections drop together. The fix is boring and reliable: stagger. I cap concurrency at three to four at a time and wait for a batch to land before releasing the next one. Re-firing the missing six in batches of three landed clean on every attempt.
The operator lesson underneath is sharper than the fix. An agent that "finishes" with a socket error and a tiny token count wrote nothing. Always check the disk for each agent's expected output and re-fire the gaps, because the supervisor's own status line will lie to you. Recursive spawning multiplies the burst risk, because one level-two agent can turn your tidy batch of four into a sudden wave of sixteen.
Wall three: the supervision tax
This is the cost nobody budgets for. The supervisor is not free. Every turn, it reads the panes, and reading a 100K-token pane re-bills that whole context on the next turn. A fat orchestrator that keeps slurping full agent transcripts is the most expensive thing in the whole rig, quietly, because it never shows up in the list of workers.
The trap is the phrase "off the flagship meter." A persistent interactive fleet of volume-model workers is not cheap just because it is not burning your most expensive model. It still burns the weekly usage envelope on a high-tier plan, and on a real Indian home broadband line the wall-clock cost is your whole evening, not just tokens. For an operator counting every rupee of compute, the lever that matters is tier, not headcount. Cheap workers spawning more cheap workers is fine. A fat supervisor reading everything is the leak.
Here is the shape that actually holds up.
# each worker in its own tmux session, one pane each
for i in 1 2 3; do
tmux new-session -d -s "agent-$i" "claude --model sonnet"
done
# read a worker WITHOUT stealing focus (the supervisor stays put)
tmux capture-pane -t agent-2 -p | tail -40
# inject a prompt with send-keys, because the GUI drops synthetic key events
tmux send-keys -t agent-2 "summarize the failing test in src/api, 5 lines" Enter
Why tmux and not a pretty GUI driver? Because the terminal widget drops synthetic key events, the window compositor blocks programmatic focus, and the kernel ships with terminal injection disabled by default. The tmux send-keys and capture-pane verbs operate below all of that, so the supervisor can read and steer a backgrounded session without ever touching the mouse. I tested the GUI path first and abandoned it; the terminal path is the only one that survives a backgrounded window.
The smaller landmines
Three more that bit me, worth one line each.
| What broke | Why | Fix |
|---|---|---|
| Build failed mid-run | tsc globs the whole project and read a half-written agent file |
Confirm every agent finished on disk before any build |
| Sibling edits vanished | One worker ran git checkout and reverted everyone's uncommitted work |
Forbid all git-state ops in agent briefs; commit between waves |
| "can't find pane" | Targeting a pane by a collided project name builds a broken id | Address the raw tmux session name, not the friendly label |
Each of these came out of a real wave, and each one is the kind of failure that looks like a model bug until you read the actual error. The build break was a syntax error on a line an agent had not finished writing. The vanished edits were uncommitted sibling work that a confused worker scrubbed with one git checkout. The missing pane was a name collision in my own targeting code.
The verdict
Recursive sub-agents are a real step forward, and @2.1.172 is the right call. The five-level cap is not a limitation to route around; it is the only thing standing between you and an unbounded, socket-dropping, cache-cold spawn storm. Orchestration buys you parallel throughput and clean context isolation. It charges you back in cold caches, dropped sockets, and a supervisor that eats your budget while you are not looking. Warm the prefix before you fan out, stagger to three or four, keep the supervisor thin, and pick your models by tier. Do that, and a fleet genuinely beats a single agent. Skip it, and a parallel run costs more, in money and in wall-clock, than doing the work one careful session at a time.
More from the same beat.
GLM-5.2 Cleared the Six Hard Tasks I Use to Vet Any Cheap Model
A new open-weights model matched my flagship on objective hard tasks. The battery I run before trusting any cheap model in production did not change.
- A leaderboard rank is a reason to test a model, not a reason to trust it. I keep a fixed battery of objective hard tasks with execution-checked answers and run it on every cheap or open release bef…
GLM-5.2 Ships 753B Open Weights. My GTX 1660 Holds 6 GB.
The most powerful open-weight model of mid-June 2026 needs a cluster. A 4B model on a 6 GB card taught me what that headline leaves out.
- Open weights and a model you can run are two different claims. GLM-5.2 ships MIT weights at over 750 billion parameters, and none of that helps a 6 GB card.
I Burned 90% Of GitHub's Free CI Minutes. Here's The Escape.
A real multi-repo empire eats 2000 free Actions minutes a month. When you hit zero, deploys stop firing silently. The fix is not paying per minute.
- A multi-repo solo operator will exhaust 2000 free Actions minutes a month, not might, will. I hit 1800 of 2000 across one account with three active content repos, no macOS or Windows multiplier, ju…