💻AI Codingadvanced

Claude Code Subagents in Practice: Fork Flag, Cache Leak, Worktree Trap

I run large parallel subagent fanouts daily. Here are the three things that decide whether you fit more agents in your cap or get garbage from collisions.

ByAditya Sharma·Jun 1, 2026·7 min read

Claude Code subagent fanout diagram showing fork-shared prompt cache and isolated git worktrees per parallel writer

The first time I fanned out a real fleet of subagents in Claude Code, I learned two lessons in one night. First, a 19-agent fanout slammed straight into my subscription cap because every agent cold-started its own context. Second, when forks all write to the same git repo, they swap each other's HEAD out mid-commit and you get commits landing on the wrong branch.

Neither failure is obvious from the docs. Both have clean fixes. If you orchestrate parallel agents, these are the three traps worth internalizing before you scale a fanout past three or four agents.

How subagents actually share (or don't share) cache

A subagent in Claude Code runs in its own context. The parent dispatches a task, the subagent does the work in isolation, and the result comes back. That isolation is the point. It keeps the parent's window clean.

The catch is the prompt cache. Prompt caching matches on a byte-exact prefix: send the same leading tokens again and the model reads them at a fraction of full input cost instead of reprocessing. A subagent that cold-starts builds its own fresh prefix and pays full input price for everything it loads. Spawn nineteen of those and you are paying nineteen fresh contexts. This is exactly how my early fanout burned roughly 950K input tokens (nineteen contexts at around 50K each) and hit the cap mid-run.

The fix is to make forked subagents inherit the parent's already-cached prefix instead of rebuilding it.

Getting more agents per cap

The lever is an environment flag: CLAUDE_CODE_FORK_SUBAGENT=1. With it set, forked subagents share the parent's prompt cache and read the shared prefix at a small fraction of full input cost rather than cold-starting. Export it in the shell that launches claude:

export CLAUDE_CODE_FORK_SUBAGENT=1

Put it in your ~/.bashrc so every interactive claude session picks it up. The flag is recent, so check your version first with claude --version. If forking does not behave as expected on an older build, update before you debug anything else, because the behavior is version-gated and the exact gate moves over time.

The numbers from my own fanout make the case. The run that cold-started everything cost roughly 950K input tokens. With the fork flag enabled, the same shape comes out closer to 412K: one fork pays the full prefix (around 48.7K), and each of the remaining eighteen reads the shared prefix cheaply (around 20.2K each) instead of cold-starting. That works out to about 2.3x more agents fitting in the same cap window. Same work, less than half the input bill.

One ordering detail decides whether this actually lands. The cache has to exist before the rest of the fleet reads it. So warm it first, then fan out:

Fire the first subagent call and wait for its stream to start. That seeds the shared cache.
Only then dispatch the remaining agents in parallel. They read the warm prefix instead of nine cold ones all racing to write cache simultaneously.

Skip the warmup and you get a thundering herd: every agent writes cache, nobody reads it, and you are back to cold-start economics. Verify it is working by checking cache_read_input_tokens in a subagent's response usage. After warmup it should be a large share of total input, not near zero.

If you run fanouts from cron (a nightly research job, a morning batch), remember that cron does not load ~/.bashrc. Set CLAUDE_CODE_FORK_SUBAGENT=1 in the crontab itself or in the wrapper script the cron calls, or the flag silently is not there and every scheduled run cold-starts.

The cache leak that bit me

Even with caching configured right, one subagent shape leaks it badly: a general-purpose subagent doing a long run of web calls.

I caught this on a general-purpose subagent that was verifying a list of social handles and RSS URLs. It ran 112 turns at a 0% cache hit rate. Every single turn wrote fresh cache and read none of it back. At Opus rates that was about $45 of pure waste, all of it 5-minute ephemeral cache writes that never got reused.

The cause is the prefix again. A general-purpose subagent carries a large tool array because every tool is available to it. When WebFetch and WebSearch results flow back into that context, something in the prefix reshapes between turns, so the byte-exact match breaks and every turn writes a brand-new prefix. Writing cache at a premium without ever reading it back is strictly worse than running with no cache at all.

The fix is to not use a wide-open general-purpose subagent for many sequential web calls. Pick one of three instead, in order of preference:

Typed subagent. If a tightly-scoped agent fits the job (a reviewer, a test runner), use it. Typed subagents have a tight tool schema, so their prefix is stable and they cache fine.
Inline. Do the web work in a single longer parent turn instead of spawning a churning subagent. Fewer turns means more prefix reuse.
No model in the loop. If the task is mechanical (does URL X return 200?), drop the agent entirely and use a plain requests.head() in Python. A status check does not need a language model.

The pattern that works when you genuinely need judgment over web data: gather the pages first (inline, or via search in the parent), then feed everything as one batched context to a single agent call that makes one decision. One fat turn caches; a hundred thin web-churning turns do not.

Parallel forks that write files

The cache traps cost money. This one costs correctness.

When you fire several forks that all write to the same git repo, they share one working tree. They start stepping on each other immediately. In one round of five parallel forks against a shared repo, three of the five reported the same failure: a sibling fork checked out a different branch mid-flight, so the first fork's HEAD got swapped out from under it and its commit landed on the wrong branch ref. One fork even ran a reset over another fork's commit. Recovery meant cherry-picks and force-with-lease, which is exactly the kind of cleanup you do not want after a clean-looking fanout.

The fix is to give each writer its own git worktree, pre-staged before launch. Not as recovery, before anything fires:

for i in 6 7 8 9 10; do
  git -C ~/repos/my-shared-repo worktree add ~/fork-$i-repo feat/branch-$i 2>/dev/null \
    || git -C ~/repos/my-shared-repo worktree add ~/fork-$i-repo -b feat/branch-$i
done

A worktree is a separate working directory backed by the same repo, with its own checked-out branch. Each fork gets a private tree, so no fork can swap another's HEAD. In each fork's prompt, point it at its own path: cd ~/fork-6-repo instead of the canonical repo path. The orchestrator does the final merge once every fork is done, sequential or octopus.

After the merge, run your typecheck on the merged result (tsc --noEmit or whatever your project uses). A merge can auto-resolve overlaps that are actually wrong, and the typecheck is what surfaces it before you push. If you forgot to pre-stage and a fork is already in trouble, git worktree add /tmp/recovery feat/the-branch is the in-fork escape hatch.

You only need this when forks share git scope. A single fork, or forks working on fully separate repos, do not need worktree isolation. The moment two or more forks touch the same repo, pre-stage. The same pre-staged-worktree rule keeps a fleet of interactive CLI agents driven from tmux from clobbering each other's commits when two sessions share one repo.

A real fanout I ran

Here is the shape that has been reliable for me: parallel writers, each owning exactly one file.

The job was generating a batch of distinct content files in one repo. I pre-staged a worktree per writer, then had each fork write only its own assigned file. No two forks touched the same path, which killed both the worktree contention and any content-collision problem in one move. I warmed the cache with the first dispatch, waited for its stream, then fanned out the rest with CLAUDE_CODE_FORK_SUBAGENT=1 set so they all read the shared prefix. The orchestrator merged at the end and ran the typecheck.

The principle that makes it work: assign disjoint scope up front. One file per agent, one worktree per agent, zero overlap. When the scope is disjoint, parallelism is close to free. When it overlaps, parallelism manufactures bugs.

When NOT to fan out

Fanout is not the default answer. Skip it when:

The work is sequential. If step B needs step A's output, agents cannot run in parallel. Forcing it just adds coordination overhead.
The job is web-heavy with no judgment. That is a Python script, not a subagent fleet. See the cache leak above.
The scope is small. Three files do not need a worktree fleet. Just do it in one context.
You cannot make scope disjoint. If every agent has to touch the same files, serialize. Overlapping writers fight, and you pay in cherry-picks.

The honest summary: parallel subagents are a force multiplier when scope is clean and the cache is warm, and a money pit or a correctness hazard when either of those is off. Set the fork flag, warm before you fan out, keep general-purpose web-churners out of the loop, and give every parallel writer its own worktree. Get those four right and you can run a real fleet without burning your cap or your branches.

Topics

#Claude Code #developer tools

More AI Coding

A blank Open Graph card next to a fixed one, with Next.js 16 static export config

💻AI Codingintermediate

Next.js 16 Static Export on Cloudflare Pages: Four Gotchas That Bit Me

I rebuilt autokaam.com on Next.js 16 static export and shipped it to Cloudflare Pages. Four things changed that do not fail loudly: params became a Promise, dynamic routes 404 without generateStaticParams, next/image breaks on external URLs, and a metadata override silently blanked my social cards. Here are the exact fixes with the code that shipped.

Jun 28, 2026·9 min read

A Python MCP server skeleton in a terminal next to a Claude Code session

💻AI Codingintermediate

Building a Custom MCP Server in Python: Claude Reaches My Stack

Claude Code is sharp until it hits the edge of your machine and your private tools. I wrote three small MCP servers in Python to close that gap. Here is the real pattern, the real gotcha that bit me, and what it costs.

Jun 1, 2026·8 min read

A SQLite FTS5 index over a folder of markdown notes returning BM25-ranked snippets to an AI agent

💻AI Codingintermediate

I Gave My AI Agents a Memory With SQLite FTS5 (No Vector DB)

Most agent-memory setups reach for Pinecone or pgvector by reflex. I put 2000+ markdown files behind SQLite FTS5 with BM25 ranking, and my agents now answer their own 'who is X' questions in under a second for zero tokens. Here is the schema, the query, and the one place lexical search loses.

Jun 1, 2026·8 min read

How subagents actually share (or don't share) cache

Getting more agents per cap

The cache leak that bit me

Parallel forks that write files

A real fanout I ran

When NOT to fan out

Related

More AI Coding

Next.js 16 Static Export on Cloudflare Pages: Four Gotchas That Bit Me

Building a Custom MCP Server in Python: Claude Reaches My Stack

I Gave My AI Agents a Memory With SQLite FTS5 (No Vector DB)