Lines of Code, Commit Counts, and Other Metrics That Measure Nothing
Lines of code rewards verbosity. Commit counts reward noise. PR counts reward splitting. These metrics feel precise and measure nothing. Here's why.
There's something comforting about counting things. Numbers feel objective. A dashboard full of charts looks like data-driven management. Lines of code, commits per week, PRs merged — they're all easy to pull from Git and easy to put on a slide.
They're also meaningless.
I don't say this as someone who's never used them. I've sat in leadership meetings where commit counts were cited as evidence of team productivity. I've seen engineering reports that proudly showed lines of code trending upward. I've watched PR-count leaderboards create incentives that had nothing to do with shipping good software.
These metrics are vanity metrics. They feel like signal. They're noise.
Lines of Code: The Original Sin
Lines of code (LOC) is the metric that refuses to die. It persists because it's trivially easy to measure and intuitively satisfying — more lines, more work, right?
Wrong. Consider these scenarios:
The refactor. An engineer spends a week cleaning up a critical module. They eliminate duplication, simplify control flow, and remove dead code. The result: 500 fewer lines of code. The system is more maintainable, more testable, and less error-prone. By LOC, this engineer had negative productivity.
The copy-paste. Another engineer implements a feature by copying an existing module and modifying it. They produce 800 lines of code in two days. The code works, but it's duplicated logic that will diverge over time and create maintenance burden. By LOC, this engineer looks like a hero.
The one-liner. A senior engineer finds the root cause of a bug that's been causing intermittent production failures for months. The fix is a single line — a missing null check. By LOC, this barely registers. In reality, it might be the most valuable engineering work done that quarter.
LOC measures volume. Engineering value has almost no correlation with volume. A 10-line fix can be more valuable than a 1,000-line feature. A deleted file can be more impactful than a created one.
And now with AI tools, LOC is even more absurd. An engineer using Claude Code can generate 500 lines of boilerplate in minutes. Does that make them 10x more productive than one who writes 50 lines of carefully considered architecture? Obviously not.
Commit Counts: Measuring Git Habits
Commit count measures how often an engineer runs git commit. That's it. It tells you nothing about what's in those commits.
The problems are well-known:
Gaming is trivial. Want to improve your commit count? Commit more often. "Fix typo" becomes a commit. "Add comment" becomes a commit. Updating a config file becomes three commits instead of one. The leaderboard goes up. Nothing of substance changes.
Work styles vary. Some engineers commit atomically — every logical change gets its own commit. Others batch commits into larger, self-contained units. An engineer who makes one commit containing a complete feature looks less productive than one who makes fifteen commits containing incremental fragments of the same feature.
Complex work looks idle. An engineer investigating a production issue for three days — reading logs, reproducing the bug, understanding the system — makes zero commits during that time. The fix, when it comes, is one commit. By commit count, they were unproductive for three days and then did the bare minimum.
AI makes it worse. Agentic coding tools often produce complete features in single sessions. The engineer who uses AI to scaffold an entire service in one sitting produces one commit. The engineer manually building incrementally produces ten. Commit count now inversely correlates with AI-assisted productivity.
PR Counts: Conflating Volume with Value
PR count gets slightly closer to something useful — at least it measures shipped artifacts rather than git commands. But it still conflates volume with value.
The splitting incentive. When PRs are counted, engineers learn to split work into the smallest possible units. One feature becomes seven PRs: add the model, add the service, add the controller, add the route, add the tests, add the docs, fix the thing you forgot. Each PR is trivial. The total is one feature. The count says seven.
Size blindness. A PR that adds a new field to an API response (10 minutes of work) counts the same as a PR that implements a distributed job queue with retry logic and dead-letter handling (a week of complex engineering). One PR equals one PR. The metric can't see the difference.
Quality blindness. A PR that introduces a SQL injection vulnerability counts the same as one that hardens authentication. A PR with no tests counts the same as one with comprehensive coverage. Volume doesn't know about quality.
GitHub Activity Graphs: The Green Squares Problem
GitHub's contribution graph — those green squares on your profile — has become an informal resume signal. More green squares, more active developer.
This creates predictable behavior:
- Engineers make trivial commits to keep their streak alive
- Weekends get unnecessary activity to fill gaps
- Private repositories (where much real work happens) don't show up
- Maintaining open-source projects that benefit from careful, infrequent updates looks worse than churning out daily noise
The green squares measure presence, not productivity. They reward consistency of git activity, not quality of engineering work.
Why We Keep Using Them
If these metrics are so obviously flawed, why do they persist?
They're easy to collect. Git gives you all of these numbers for free. No setup, no tooling cost, no configuration. Any intern can write a script that counts commits per engineer per week.
They feel objective. Numbers on a dashboard look like data. They give leadership something concrete to discuss, even if what's being discussed is meaningless.
The alternative was nothing. Before these metrics, engineering productivity was entirely subjective — manager vibes, hallway reputation, who spoke up in meetings. Counting things felt like progress, even when the things being counted didn't matter.
Goodhart's Law is subtle. When a metric first gets introduced, it often correlates loosely with real productivity. Engineers who commit more do tend to be doing more work. The correlation breaks down as people optimize for the metric instead of the outcome, but by then the metric is entrenched.
What Actually Matters
The problem with vanity metrics isn't measurement itself — it's measuring the wrong things. The question isn't "how much activity did this engineer generate?" It's "what did they ship, and how complex was it?"
That requires evaluating the actual output: the code that merged to production. Not how many times they committed. Not how many lines they wrote. Not how many PRs they opened. The substance of what they actually delivered.
This is why we built GitVelocity to score every merged PR across six dimensions of complexity — Scope, Architecture, Implementation, Risk, Quality, and Performance & Security. A brilliant 30-line security fix scores higher than 500 lines of copy-paste boilerplate because the engineering complexity is higher. A single PR that implements a distributed processing pipeline scores higher than ten PRs that each add a config field.
The score is grounded in the actual code diff. You can't inflate it by committing more often or splitting your work into smaller pieces. The complexity is in the code, not the packaging.
From Volume to Value
The engineering industry is slowly moving past vanity metrics. DORA gave us process health. Story points gave us a shared planning language (however flawed). But we've been missing the most fundamental measure: what actually shipped and how complex was it?
That's not a question you can answer by counting things. It requires understanding the code itself — its scope, its architecture, its risk, its quality. AI now makes that possible at scale.
The era of vanity metrics should be ending. The numbers were always made up. The dashboards were always noise. It's time to measure what matters.
GitVelocity measures engineering velocity by scoring the complexity of every merged PR — not the volume.
Conrad is CTO and Partner at Headline, where he leads data-driven investment across early stage and growth funds with over $4B in AUM. Before becoming an investor, he founded Munchery (raised $130M+) and held engineering and product leadership roles at IAC and Convio (IPO 2010). He and the Headline engineering team built GitVelocity to help engineering organizations roll out agentic coding and measure its impact.