Frequently Asked Questions

Is GitVelocity really free?

Yes. GitVelocity is completely free. You supply your own Anthropic API key, which means the expensive part -- AI inference -- runs on your account, not ours. All we operate are lightweight servers to orchestrate the scoring pipeline, so our costs stay minimal. We're Headline, a data-driven venture capital firm with our own engineering team. We built GitVelocity for ourselves and our portfolio companies, loved it, and decided to share it with everyone.

Why do I need to bring my own API key?

GitVelocity uses your Anthropic API key to analyze PR diffs. We exclusively use Anthropic models because we think they produce the best scoring results. Your key is encrypted at rest and can only be decrypted at runtime when a score is being generated -- we never see it in the clear. This bring-your-own-key model is also what keeps GitVelocity free: since you cover the AI inference cost directly, we do not need to charge for the product.

Do you store our source code?

No. We only read the diff of each merged pull request -- the same thing you would see in a PR review. We do not pull your full repository or model a graph of your code. The only things we store are the score, its dimensional breakdown, and a brief rationale. The diff is processed and immediately discarded.

Do you sell our data?

No. We're a regulated venture capital firm. We cannot and do not sell data or operate a side business. GitVelocity exists because our engineering team loves building it and sharing it with other engineering organizations.

Can engineers game the system?

GitVelocity is designed to be gaming-resistant. The AI reads actual code diffs and evaluates complexity across six dimensions -- it does not rely on superficial metrics like lines of code or number of commits.

Common gaming strategies do not work:

  • Splitting PRs into smaller pieces reduces each individual score via the Effort Scale Factor. Total velocity stays roughly the same.
  • Adding meaningless code does not increase complexity scores. The AI evaluates the substance of changes, not their volume.
  • Combining unrelated changes into one PR does not reliably increase the Base Score, because the AI assesses coherence and architectural impact, not just surface-level scope.

The most effective way to increase your scores is to take on genuinely complex work and ship it well.

How does test code affect my score?

Test code is handled differently at each stage of the scoring formula.

The base score is calculated from implementation code only. Test files and documentation files are excluded from the base score evaluation. The AI focuses on the complexity of what you built, not how much test code surrounds it.

The Quality dimension (capped at 15 out of 100 points) assesses test quality — edge case coverage, test types, integration tests — not test volume. Writing 500 lines of trivial tests scores the same as writing 50 well-targeted ones.

The Effort Scale Factor does include test files in its line count, because writing tests is real engineering effort. However, the ESF is a multiplier on your base score — so inflating test volume without substantive implementation work produces marginal gains at best. A low base score multiplied by a higher ESF is still a low score.

What about chained or stacked PRs?

GitVelocity only scores PRs that merge to your default branch (typically main). This has specific implications for stacked or chained PR workflows.

How it works: If you have a chain of PRs where PR A merges into PR B, PR B merges into PR C, and PR C finally merges into main, only that final merge to main gets scored. The intermediate merges (PR A into B, PR B into C) are not scored because they do not target the default branch.

Does that mean the intermediate PRs are lost? Yes, in terms of individual scoring. Only the merge to main is evaluated. The combined score of one large merge will not equal what the individual PRs would have scored if each had merged directly to main separately.

Is that unfair? In practice, it does not meaningfully affect anyone's performance picture when you look at trends over time. Consider the math:

  • Hitting 100 on a single PR is extremely rare. The highest score we have seen internally on a 1--2 week project was 81.
  • Touching many components in a single PR does not automatically push the score toward 100. The AI evaluates coherence and depth, not just breadth.
  • An engineer who splits work into smaller PRs and merges each one directly to main will tend to accumulate more total points than someone who stacks the same work. Some might call this unfair, but the difference washes out over time.
  • Month-over-month trends naturally smooth out these variations. Top performers and those needing support stand out regardless of which PR strategy they use.

Our recommendation: Run a historical backfill so you have enough data for trends to emerge. Day-to-day scores can be noisy -- a stacked PR workflow might undercount one week, while a burst of small merges might overcount another. But trends over weeks and months are genuinely useful and tell an accurate story regardless of how engineers structure their PRs.

Does GitVelocity only count shipped code?

Yes. Only merged pull requests are scored. Draft PRs, open PRs, and PRs closed without merging are not included. This is a deliberate design choice: GitVelocity measures what actually ships to production, not what was attempted.

How does AI-generated code affect scores?

GitVelocity purposefully treats AI-generated code identically to human-written code. This is an intentional, opinionated design choice.

Code is code, regardless of whether a human or machine wrote it. In the AI era, engineering productivity increasingly means how effectively you can shepherd AI-generated code to production -- reviewing it, integrating it, ensuring quality, and shipping it.

GitVelocity measures the complexity of what ships, not how it was produced. An engineer who uses AI to ship more complex work faster is demonstrably more productive, and their scores will reflect that.

How consistent are the scores?

Scoring consistency is a core design requirement. The same PR scored multiple times will land within a 2-4 point range. This consistency comes from several mechanisms:

  • 18 anchored reference examples that the AI uses as calibration points across the scoring spectrum
  • Structured rubric evaluation across six defined dimensions with specific scoring criteria
  • Deterministic prompt design that minimizes variance between scoring runs

Scores are consistent enough to support meaningful trend analysis across weeks and months.

What languages does GitVelocity support?

GitVelocity is language-agnostic. The AI analyzes code structure, architectural patterns, and complexity regardless of programming language. Whether your team writes TypeScript, Python, Go, Rust, Java, or any other language, PRs are scored using the same rubric and the same evaluation criteria.

Can I use GitVelocity with private repositories?

Yes. GitVelocity works with both public and private repositories on GitHub, Bitbucket, and GitLab. The integration requires read access to pull requests in order to analyze diffs when they merge. Code is analyzed in real-time and is not stored after scoring is complete.

What are your security standards?

We operate in both the US and Europe and hold ourselves to European regulatory standards -- the higher bar. All data is encrypted at rest and in transit. We follow CIS benchmarks, conduct regular penetration testing, and are fully compliant with GDPR and the EU Digital Operational Resilience Act (DORA). For the full details, see our Security documentation and Privacy Policy.