Estimate sustained throughput, effective utilization, and queue wait for a concurrent multi-agent workflow. Adjust workers, latency, and arrival rate to find the bottleneck before you scale.
Throughput in a concurrent agent system is governed by the same math as a multi-server queue. With W workers each finishing a task in L seconds, one worker completes 1/L tasks per second, so the raw service capacity is:
capacity = (W × B) / L
where B is the tasks handled per batch invocation. Orchestration is never free, so we discount capacity by the overhead fraction o to get effective capacity C_eff = capacity × (1 - o). This overhead captures the scheduler, message routing, retries, and serialization that steal time from useful work in real multi-agent pipelines.
The system can only sustain whatever the slower side allows, so actual sustained throughput = min(arrival_rate, C_eff). Utilization is ρ = arrival_rate / C_eff. When ρ < 1 the pipeline keeps up; as ρ approaches 1, queueing explodes.
We approximate the mean time a task waits before an agent picks it up using an M/M/c-style heavy-traffic factor: wait ≈ (ρ / (1 - ρ)) × (L / W) seconds. This is intentionally conservative — it climbs sharply once utilization passes ~85%, which is exactly where adding workers or trimming latency pays off most. If ρ ≥ 1 the queue is unstable and grows without bound, so we flag it instead of printing a finite number.
The marginal-worker hint compares current effective capacity against capacity with one extra worker, telling you the incremental tasks/sec a new agent slot buys — useful when each concurrent agent has a real dollar cost. Unlike generic throughput calculators, this tool folds orchestration overhead and batch size directly into the capacity term, so the number reflects an agent fleet rather than an idealized CPU.