From backlog to pull request, without human intervention. Swarm Pilot is an autonomous AI software pod that plugs into Azure DevOps and ships code while you sleep.
Four reasons software teams ship late, and how we remove all four.
Vague tickets pile up sprint after sprint, cluttering the workspace and eroding planning capacity.
The divide between what stakeholders asked for and what ships grows wider every cycle.
Shifting between planning, coding, and testing kills velocity, every interrupt is a tax.
Delays accrue at every handoff: PM โ BA โ Dev โ QA. Nothing moves faster than the slowest human in the chain.
Your backlog is the prompt. Your PRs are the output.
Reacts instantly to workitem.updated events. Polling mode available for
closed networks.
Swarm_Ready, Ready_For_Dev, Verified, ADO tags ARE the state machine.
Agents automatically inherit context from dependency relations. No reinventing work a predecessor already did.
Creates pull requests by default, or โ with a Commit tag โ squash-merges straight to
main.
Each agent has a named role, a locked-down tool allowlist, and a defined AI personality. Dev can't approve its own code. QA can't rewrite files. PM can't edit specs.
Triage โ Analysis โ Build โ Verify
Triages tickets, checks dependencies, routes work to specialists.
Converts vague asks into Gherkin specs, grounded in the real codebase.
Writes code in sandboxes, pushes feature branches, leaves servers running for QA.
Black-box HTTP probes, Playwright UI verification, PR creation or Fast Track merge.
Map ยท Generate ยท Decompose โ run once per product
Maps your codebase and writes the machine-readable README_AI.md.
Reads your PRD and produces the Architecture Backlog of enabler tickets.
Turns a PRD into a full Epic โ Feature โ Story hierarchy with Gherkin AC.
Breaks User Stories into atomic Tasks (max 5 each) with predecessor chains.
Audit and review โ invoked independently
Deduplicates, scores, and sanitizes your backlog. Filters the noise so only actionable work survives.
Audits for security risks, architectural debt, and hidden vulnerabilities. Nothing ships unchecked.
Point it at a new product. Walk away.
The Cartographer scans your repo and writes
README_AI.md, a manual every agent reads before touching
code.
The Librarian scans your ADO backlog, flags stale (>1yr) tickets, deduplicates via semantic similarity, and stages the rest.
Solutions_Architect + Generative_BA read your PRD and produce a full Epic โ Feature โ Story backlog with Gherkin AC.
Decomposer_BA splits each User Story into atomic Tasks and wires up execution order so runtime agents can just pick and ship.
Everything stays in New until a human promotes selected tickets to Active. That's when the runtime swarm wakes up.
Where agents actually work. Every ticket gets its own stateful Linux environment. None of it touches your laptop.
cd frontend stays
effective between commands. QA inherits the exact running dev server Builder_Dev left behind.
Local-docker for dev,
AKS for production. Agents never know which is in
use, the SSH transport is identical.
Full isolation from host. Configurable TTL. Auto-cleaned on ticket close. Every command streamed to Mission Control.
A cinematic, real-time visualization of your AI team at work. Watch every agent think, code, and collaborate โ with full DVR-style playback of any session.
Full PM โ BA โ Dev โ QA lifecycle. Humans approve output, not process.
Agents accumulate institutional knowledge across tickets, conventions, failure patterns, codebase pitfalls. The swarm gets smarter over time.
Every operation runs in an isolated Docker/AKS environment. Safe and secure.
Semantic search over your entire repo via ChromaDB. Monorepo-aware.
Agents consult each other before escalating. Fully traceable decision logs.
Headless Playwright screenshots attached directly to ADO work items.
21 curated guidance documents covering project structure, routing, state management, and testing patterns. Agents read only the docs relevant to the current task, no prompt bloat.
Autonomy without observability is a liability. Swarm Pilot assumes things will break.
60-second liveness pulses visible in logs and the Mission Control dashboard.
Stuck agents are forcibly released to prevent blockages.
Redis checkpointing resumes agents from their exact state after restarts.
Lock contention and rate limits handled gracefully.
On boot, re-dispatches any tickets that were mid-processing when the last container died.
Auto-captures learnings after handoffs and rejections; top-5 relevant memories inject into future prompts.
QA must consult Dev before rejecting. Inline consultation without losing ticket ownership.
21 framework-specific standards read only when needed โ not front-loaded into every prompt.
Modern. Modular. Production-ready.
Session 18 โ 34 tickets processed autonomously over 6 hours.
| Agent | Model | Cost Share |
|---|---|---|
| Builder_Dev | Claude Sonnet 4.6 | ~67% |
| Clarifier_BA | Claude Sonnet 4.6 | ~14% |
| Critic_QA | Claude Sonnet 4.6 | ~14% |
| Sentinel_PM | Claude Haiku 4.5 | ~5% |
| AKS Cluster (2ร D2s_v3 nodes) | $392/mo |
| PostgreSQL Flexible Server | $17/mo |
| Azure Cache for Redis | $16/mo |
| Container Registry + Networking | $10/mo |
| Monitoring (free tier) | $0/mo |
| Infrastructure Total | ~$435/mo |
* Cost and token metrics derived from Anthropic API usage. Cache hit rate is calculated as Cache Read รท (Cache Read + Non-cached Input). Infrastructure cost is separate.
Multi-swarm: marginal cost ~$0 per additional instance (up to ~20 swarms on the same cluster).
Shipping fast. Building forward.
Agents accumulate institutional knowledge across tickets; conventions, failure patterns, and codebase pitfalls. The swarm gets smarter over time.
Full Docker Compose orchestration. One command spins up the entire swarm with Redis, PostgreSQL, ChromaDB, and state persistence.
Manage multiple repositories from a single swarm instance.
Expand beyond Azure DevOps to GitHub Issues and Pull Requests.
Automated feedback loops from QA failures back to BA refinement.
See Swarm Pilot run a real ticket end-to-end. Typical demo runs 30 minutes.