Prompt Caching

What prompt caching is

Prompt caching lowers what you pay Anthropic per review by reusing the parts of the prompt that don't change between reviews. GitVelocity calls Anthropic with your own API key, so the savings land on your invoice, not ours.

It only works on Anthropic Claude models. If you've picked an OpenRouter model in Score Settings (even one that points at Claude under the hood), the badge on Usage will read "Caching off" and the rest of this page doesn't apply.

What's actually cached

Most of what we send to Anthropic on a review is identical every time: the scoring rubric, the format requirements, and your team's scoring guidelines. About 5,000 tokens of stable content. That's what gets cached.

Everything else changes per PR: the repo, the PR number, the diff, the commit metadata, the pre-computed metrics. None of that is cached, and shouldn't be.

On the first review we pay full price to write the stable part to the cache. If the next review lands while the cache is still warm, we pay about 10% of the input price to read it back. The model's output is always billed at full price; caching only touches the input prompt.

Per-organization dynamic TTL

Anthropic gives us exactly two cache lifetimes: 5 minutes and 1 hour. There's no 6-hour or 24-hour option. That's the whole menu, not a GitVelocity choice. The 1-hour cache is more expensive to write (2× the base input price) but lasts longer; the 5-minute cache writes at 1.25× and refreshes for free on every hit.

GitVelocity picks one of three settings per org, rechecked at most every 24 hours from the traffic we observe:

  • 1-hour cache. The default, and best for steady review traffic spread across the day.
  • 5-min cache. Picked when your reviews cluster in tight bursts. The cheaper write wins when reads happen within minutes of each other.
  • Caching off. Picked when your review pace is so slow even the cheapest write would never pay back before the cache expires.

The current choice shows on the badge on the Usage page. There's nothing to configure; GitVelocity sets it from your traffic.

What if my team commits a couple of times a day?

You'll see "Caching off" on the badge. With multi-hour gaps between reviews, the cache expires before anyone reads it, and every write would be wasted spend. GitVelocity sees that pattern and stops writing the cache for you, so you pay the same as if caching didn't exist. The worst case here is "no caching benefit," not "caching cost you money."

If your pace picks up later (a refactor week, a release crunch), the 24-hour recheck will flip you back to a 5-min or 1-hour cache the next time it runs.

Backfill pre-warming

When a new org runs its first historical backfill, GitVelocity fires one warm-up call before the actual scoring requests go out. The warm-up writes the rubric and guidelines into the cache, so the first real review hits a warm cache instead of a cold one. Without it, every backfill would start with a wasted cold-cache write at the top.

Pre-warming runs once per backfill job. Not once per PR.

Reading the cache section

The Usage page shows three numbers for the period you've selected.

Hit rate (the hero number) is cache reads over total cache traffic (reads + writes). Pre-warm writes count on the writes side, so a backfill day won't look better than it actually was. A 75% hit rate means three of every four cache-eligible tokens came from cache instead of full-price input.

Estimated savings is what you'd have paid without caching, minus what you actually paid. It's an estimate from Anthropic's published cache prices, so your real invoice line items may round differently. The math uses the 1-hour write multiplier (2.0×) as a conservative lower bound, so orgs currently on the 5-min cache see slightly under-reported savings on their writes.

Token breakdown is collapsed by default. Open it for a stacked bar of regular input vs. cache writes vs. cache reads.

What hit rate to expect

Hit rate depends almost entirely on the shape of your traffic.

Bursty webhook traffic, where merges arrive within minutes of each other, usually lands at 70-90%. The first review in the burst writes the cache; the next three or four read it.

Steady pace, a couple of PRs per hour during the workday, usually lands at 40-60%. The 1-hour TTL helps, but you still pay a fresh write at the top of each hour.

Concurrency races drag the number down too. When several PRs hit our scoring workers at the same time, each worker can miss the cache before any of them has finished writing it. The first burst of the day is usually the worst; later bursts catch up.

A hit rate of 8% isn't a bug. It's a signal that your traffic is too spread out for the cache window. If the math stops favoring caching at all, the next 24-hour recheck will switch you to a shorter TTL or pause caching entirely.