⚡Automationadvanced

How I Drive a Fleet of CLI Coding Agents From One tmux Session

capture-pane to read every session, send-keys to inject prompts, a one-line board across the whole fleet, and why the PTY layer beats every GUI automation trick I tried first.

ByAditya Sharma·Jun 28, 2026·8 min read

One orchestrator reading and injecting into a detached tmux fleet of CLI coding-agent panes via capture-pane and send-keys

The first night I tried to run ten coding agents at once, I did it the obvious way and it failed inside five minutes. I had ten claude sessions open across gnome-terminal tabs and a script that was supposed to type a prompt into each tab using xdotool. Nothing landed. The keystrokes vanished, the windows would not come to the front when I asked them to, and the one tab I could type into was whichever one I had clicked last. I threw the whole approach out and rebuilt it on tmux. The version I run now drives a fleet of interactive CLI agents from a single orchestrator session, reads what each one is doing, and injects prompts into any of them without a single window being visible.

This is the read-inject-digest loop, the reason it has to live at the PTY layer, and the gotchas I hit getting ten sessions stable.

Why GUI automation of a terminal does not work

Before tmux, I spent a frustrating evening confirming three separate blockers on my Ubuntu box. Each one alone is enough to kill GUI control of a terminal. Together they make the GUI route a dead end.

First, gnome-terminal's widget toolkit (VTE) drops synthetic X keystrokes. You can issue a perfectly valid xdotool type at the window and the characters never reach the shell. Second, the GNOME compositor (mutter) blocks programmatic window focus and raise as a focus-stealing defence, so you cannot even reliably bring the target terminal forward to type into it. Third, the old kernel escape hatch is closed: dev.tty.legacy_tiocsti is set to 0 by default now, which disables the TIOCSTI ioctl that once let a process push bytes straight into another terminal's input. I checked the live value and it was 0, so that door is bolted.

tmux walks around all three because it owns the pseudo-terminal. It does not type into a window, it does not ask the compositor for focus, and it does not touch TIOCSTI. It reads a pane's screen buffer directly and writes into a pane's input at the PTY layer. The session does not even need to be attached to a visible window. You drive N agents without N windows, read and write any of them, fully in parallel. In my testing that single property is the whole unlock.

The four primitives

The orchestrator is a small shell wrapper I wrote around [email protected]. It has four verbs: launch the fleet, peek at one session, tell one session a prompt, and broadcast to all of them. The agents themselves are real interactive TUIs, the Claude Code CLI (claude 2.1.195) and [email protected], not one-shot -p calls.

# launch N detached coding-agent sessions, one per tmux window
launch() {
  local name=$1 n=$2
  tmux new-session -d -s "$name" -x 220 -y 50
  for i in $(seq 1 $((n-1))); do tmux new-window -t "$name"; done
  for i in $(seq 0 $((n-1))); do
    tmux send-keys -t "$name:$i" -l "claude --dangerously-skip-permissions"
    tmux send-keys -t "$name:$i" Enter
  done
}

# read the last 120 lines of one pane, no attach, no focus
peek() { tmux capture-pane -p -t "$1" -S -120; }

# inject a prompt as literal text, then submit
tell() {
  local t=$1; shift
  tmux send-keys -t "$t" -l "$*"
  tmux send-keys -t "$t" Enter
}

# same prompt into every window of the session
broadcast() {
  local name=$1; shift
  for w in $(tmux list-windows -t "$name" -F '#{window_index}'); do
    tmux send-keys -t "$name:$w" -l "$*"; tmux send-keys -t "$name:$w" Enter
  done
}

capture-pane -p prints the pane's visible buffer to stdout. The -S -120 flag walks 120 lines back into scrollback, which is plenty to see what an agent is mid-way through. send-keys -l sends the argument as literal text rather than interpreting it as tmux key names, then a second send-keys ... Enter submits. Reading and writing are decoupled, so the orchestrator can poll every pane on a loop and inject only where it decides to.

The board: one line per session

Ten raw capture-pane dumps are unreadable. What makes a fleet drivable is a digest: one line per session showing the foreground command and the last non-empty thing on screen. I poll this board every 3 seconds and treat a pane as stale if its tail has not changed in 240 seconds.

# one line per pane: index, foreground command, last non-empty line
board() {
  local name=$1
  tmux list-panes -t "$name" -s -F '#{window_index} #{pane_current_command}' \
  | while read -r idx cmd; do
      last=$(tmux capture-pane -p -t "$name:$idx" -S -8 \
             | grep -v '^[[:space:]]*$' | tail -1)
      printf '%-3s %-8s %s\n' "$idx" "$cmd" "${last:0:90}"
    done
}

A board sweep across ten panes finishes in well under a second because capture-pane is a local read against the tmux server, not a network call. The output reads like this:

Pane	Command	Last line on screen	Read as
0	node	esc to interrupt	working
1	node	>	idle, waiting for input
2	node	> refactor the parser into…	text typed, not submitted
3	bash	$	agent exited to shell

That four-state read (working, idle, needs-submit, exited) is the entire job of the orchestrator. It reads the board, decides which pane needs a nudge, and injects through tell. Everything else is the agents doing the actual work.

The gotchas I hit running this in production

A demo of three sessions hides the failure modes. Running ten in production for a full night surfaced four that I now design around.

The swallowed Enter is the most common. A Claude Code pane that is still initialising will accept the prompt text but eat the Enter, so the prompt sits in the compose box unsent and the session does nothing. My board catches it because pane state shows the prompt text still on screen as the last line. The fix is to re-send a bare Enter on the next poll. I sweep the board, find any pane whose tail still echoes my injected prompt, and fire Enter again until the turn actually starts.

send-keys sends literal text, which matters for control input. To run a slash command like /exit you send the bare token, not a sentence about exiting. A numbered TUI dialog, like the one-time "trust this folder?" prompt a fresh Claude Code pane shows on first launch, gets a single digit and Enter, never a prose answer. I send 1 and move on.

The mouse-report flood took me a while to diagnose. The Claude Code TUI turns on xterm mouse tracking (\e[?1000h then \e[?1006h, escalating to all-motion reporting). If you leave a compose box focused but not reading in an attached gnome-terminal, every physical mouse move over that window dribbles a literal \e[<35;col;rowM report into the box. Lean on the desk for a minute and the box fills with ^[[<35;..M garbage and the turn is dead. There is no flag to disable it. You scrub the box with C-u, never C-c, because C-c exits the agent. The cleaner answer is the headless model itself: a detached pane has no window receiving mouse events, so the flood cannot happen. That alone is an argument for keeping the fleet detached.

Killing a session badly loses its transcript. tmux kill-window and kill-session send SIGHUP to the agent before its conversation log flushes to disk. If you might want to resume or read that session later, drain it with a literal /exit first and wait for the shell prompt to return, then kill the window. I learned that one by losing a session log I wanted back.

Cost and model discipline

Idle interactive panes cost nothing. A coding-agent TUI only bills tokens while it runs a turn. Ten panes sitting at their prompt waiting for input are free, so a large detached fleet is cheap to keep warm between bursts of work. The real cost lever is model assignment, not session count. I run the fleet on the cheaper, faster model and keep the single orchestrator that reads the board and routes work on the premium model. Many cheap hands, one expensive brain.

Start small and grow the same pattern. The first night I launched 3 sessions, injected 3 distinct prompts (print your model id, return the value of 391, name the capital of France), and read all three correct answers back off the board before I trusted it. Then I scaled the identical loop to running 10 sessions. Validate concurrent-session behaviour and your subscription's parallel limits at 3 before you commit to 10.

When not to reach for this

A fleet is for genuinely parallel, disjoint work. If the job is one task, use one session and skip the orchestration entirely. If you need pixel-level control of a web app, this is the wrong layer, because tmux drives terminals and not a browser. If two sessions must write the same files, serialize them or give each its own git worktree, the same isolation pattern I use for parallel subagents. And if you cannot watch it, do not run ten: a detached fleet still needs the board poll to catch a swallowed Enter, or one stuck pane sits dead for an hour while you assume it is working.

What I learned across these nights is that orchestrating terminals is a solved problem the moment you stop fighting the GUI. Read at the PTY layer, inject at the PTY layer, digest into one line per session, and the fleet becomes something you actually steer rather than something you constantly watch.

Topics

#tmux #CLI agents #orchestration #Claude Code #automation #Linux

More Automation

Terminal showing a structuredData.json table extraction from a scanned PDF via Adobe PDF Services REST

⚡Automationintermediate

Programmatic PDF Table Extraction and OCR with Adobe PDF Services REST: The Auth, the Extract Call, and Parsing the Output

I wired Adobe PDF Services REST into my stack as a local tool and pointed it at the scanned invoices and merged-header statements that pdfplumber turned into soup. Here is the exact auth flow, the extract call, and the structuredData.json parsing I run in production, with the real latency and free-tier limits.

Jun 28, 2026·8 min read

An AT-SPI2 accessibility tree of a GTK dialog with element names and roles, next to the same dialog being driven by an agent

⚡Automationadvanced

I Gave My AI Agent Eyes and Hands on Native Linux Apps With AT-SPI2

I was tired of my agent missing buttons because a window shifted a few pixels. So I pointed it at the AT-SPI2 accessibility tree instead, the same data a screen reader consumes, and had it act by element name and role. This walks through driving a GTK dialog and a native Save dialog, then reading the value back to prove the action actually landed.

Jun 28, 2026·9 min read

Cloudflare named tunnel exposing a self-hosted app, kept reboot-proof with a systemd unit

⚡Automationintermediate

Reboot-Proof Cloudflare Named Tunnels: The systemd Setup I Run in Production

I expose every self-hosted app on my home box through a Cloudflare named tunnel, kept alive by a systemd unit that has survived every reboot for weeks. This is the real login-to-systemd flow, the config file, the unit, and why a named tunnel beats a quick tunnel for anything you mean to keep.

Jun 28, 2026·8 min read

Why GUI automation of a terminal does not work

The four primitives

The board: one line per session

The gotchas I hit running this in production

Cost and model discipline

When not to reach for this

Related

More Automation

Programmatic PDF Table Extraction and OCR with Adobe PDF Services REST: The Auth, the Extract Call, and Parsing the Output

I Gave My AI Agent Eyes and Hands on Native Linux Apps With AT-SPI2

Reboot-Proof Cloudflare Named Tunnels: The systemd Setup I Run in Production