Estimate Your Prompt Caching Cost Savings
Prompt caching lets agents and orchestration pipelines reuse a large static prefix (system prompt, tool schemas, retrieved context) across many calls. Plug in your numbers below to see the per-run and monthly savings.
| Breakdown | Tokens / mo | Cost / mo |
|---|---|---|
| Cache writes (cold fills) | 0 | $0 |
| Cache reads (warm hits) | 0 | $0 |
| Cached-path misses (full base) | 0 | $0 |
| Fresh input (always base) | 0 | $0 |
How this calculator works
Caching changes the unit economics of repeated LLM calls. Without caching, every request pays the base input rate for the entire prompt — including the large static prefix you resend each time. With caching, that prefix is billed once at the write rate when the cache is filled, then at the much cheaper read rate on every subsequent hit within the time-to-live window.
The monthly no-cache cost is simply calls × (cached + fresh) × base / 1e6. The cached cost splits the prefix into three streams. Let h be the hit rate and R be reuse-per-write (hits served by one fill). Hit calls number calls × h; the number of cold writes is hits / R, since one write amortizes across R reads. So write tokens = (hits / R) × cached billed at the write rate, read tokens = (hits − hits/R) × cached at the read rate, and miss calls = calls × (1 − h) pay the full cached × base. The fresh, turn-specific tokens never cache and always cost calls × fresh × base.
Summing those four streams gives the cached total. The savings figure and percentage compare it against the no-cache baseline, and the progress bar visualizes the proportion eliminated. The key insight competitors miss: caching only wins when the read discount on hits outweighs the write premium on fills. A low hit rate or low reuse-per-write can make caching cost more — raise reuse-per-write or hit rate and watch the break-even flip. This is why long-running agent loops and multi-agent fan-out (which hammer the same system prefix thousands of times) see the largest gains, while one-shot calls rarely justify a cache write.