Estimate End-to-End Latency of Your Agent Pipeline

Model each stage of a multi-agent workflow, then compare a strictly sequential run against a concurrency-aware schedule. The tool surfaces your critical path and the single stage costing you the most wall-clock time.

Max concurrent agents (parallel run)

Per-handoff orchestration overhead (ms)

Pipeline stages

Sequential total

–

Parallel total

–

Speed-up

–

Each stage latency = model/compute time + tokens ÷ throughput. Stages sharing the same group run concurrently up to the worker limit.

How the latency model works

Every stage in a multi-agent pipeline contributes a per-stage latency computed as base_ms + (tokens / throughput) × 1000, where throughput is tokens per second for that agent's model. The base time absorbs queueing, tool calls, and time-to-first-token; the token term captures generation length. You enter these per stage so the math reflects your real workload, not a generic benchmark.

The sequential total is the naive sum: every stage waits for the previous one to finish, plus one orchestration handoff between each pair of stages. This is the latency you get from a simple chain where each agent's output feeds the next. Formally, T_seq = Σ stage_i + (n − 1) × overhead.

The parallel total models concurrency. Stages that share a group label can run at the same time, bounded by your max concurrent agents limit. Within a group we use a greedy longest-processing-time bin-packing pass: the heaviest stages are assigned to the least-loaded of the available worker slots, and the group's duration becomes the busiest slot. Groups still run in order, so the pipeline total is the sum of each group's packed duration plus one handoff between groups. When a group has more stages than workers, the calculator shows how the queue serializes the overflow — the same effect you see when ten subagents fan out onto three concurrent connections.

The critical path is the longest single chain of work that cannot be parallelized away. The calculator flags the stage with the largest individual latency as your bottleneck: shaving time there, or splitting it into a faster model, moves the whole pipeline more than optimizing any cheaper stage. Speed-up is simply T_seq / T_par — a number above 1 means concurrency is buying you wall-clock time, while a value near 1 means your stages are too serially dependent to benefit from more workers. Use it to decide whether adding agents or shrinking the bottleneck is the better next move.

Estimate End-to-End Latency of Your Agent Pipeline

Pipeline stages

How the latency model works

Related Tools