Estimate Token Spend When AI Agents Retry on Failure

When an agent step can fail and retry, your real cost is a probability-weighted sum across attempts — not just one call. Enter your error rate, retry limit, and per-attempt tokens to see the expected cost, attempt count, and worst case.

How the retry cost model works

A naive estimate multiplies one call's cost by your task count. That undercounts any pipeline where a failed attempt is retried, because every retry consumes the full input and output tokens again. This calculator computes the expected number of attempts per task as a finite geometric series, then scales tokens and price by it.

Let p = failure probability per attempt, R = max retries.
Expected attempts per task = 1 + p + p² + … + p^R = (1 − p^(R+1)) / (1 − p).
Expected tokens = attempts × (input + output) per attempt.
Final-failure probability = p^(R+1) (every attempt failed).

The series captures the diminishing likelihood of each successive retry: the second attempt only runs with probability p, the third with , and so on. As p approaches 0 the expected attempts approach exactly 1; as p approaches 1 it approaches R+1, the hard ceiling. Because attempts are bounded by the retry limit, the geometric sum is finite even at high error rates — that bound is what makes the worst-case column meaningful for budgeting.

Two figures matter most for an orchestration budget. The expected cost tells you the average bill once retries are amortized over many tasks. The worst-case cost assumes every task exhausts all R+1 attempts — the ceiling your concurrency and rate-limit headroom must survive during an incident. The final-failure rate (p^(R+1)) is the share of tasks that still fail after all retries; those need a dead-letter path, human escalation, or a fallback model, and they are pure sunk cost since they consumed every attempt without producing a usable result. Tune the retry limit against this number: raising R shrinks final failures but inflates worst-case spend, so the right setting balances reliability against the tail of your token bill.

Related Tools