Engineers Hate Being Measured (And They're Right To)

I was an engineer before I was a founder. I remember sitting in sprint planning, watching the tech lead assign an 8 to something that took me 2 days, and a 3 to something that broke my brain for a week. The numbers never matched reality. And when those numbers showed up in a performance review, it felt like a betrayal.

So when I tell you that engineers are right to be skeptical of measurement, I'm not being diplomatic. I'm speaking from experience. The history of engineering measurement is a history of bad metrics creating bad incentives.

A Brief History of Bad Metrics

Lines of code. The original sin. Measuring output by volume incentivizes verbosity. The engineer who writes 200 lines of clean code gets outscored by the one who writes 800 lines of copy-pasted spaghetti. Worse, it punishes refactoring — deleting 500 lines of dead code shows up as negative productivity.

Commit count. Slightly more sophisticated, equally gameable. Engineers learn to make tiny commits. "Fix typo" becomes a productivity signal. Meanwhile, the engineer spending three days on a critical security fix produces one commit and looks idle.

Story points. The Agile world's answer to everything. But points are self-reported estimates, not measures of output. Teams inflate them. Managers compare them across teams despite being told not to. They become a negotiation tool rather than a planning tool.

PR count. Close, but still wrong. Incentivizes splitting work into the smallest possible PRs. One meaningful feature becomes seven trivial PRs. The actual value delivered is identical.

DORA metrics. Deployment frequency, lead time, change failure rate, time to restore. These measure process health, not individual contribution. Useful at the team level, but they tell you nothing about which engineers are struggling or thriving.

Every one of these was introduced with good intentions. Every one was gamed, misused, or misinterpreted within months.

Why Engineers Are Right to Be Skeptical

Engineers resist measurement for three legitimate reasons:

1. Goodhart's Law Is Real

"When a measure becomes a target, it ceases to be a good measure." This isn't theoretical — it's documented reality in every organization that has tried to measure engineering output. The moment a metric appears on a dashboard, engineers start optimizing for the metric instead of the outcome.

2. Context Gets Ignored

A senior engineer mentoring three juniors will have lower individual output than one who puts their head down and codes. An engineer doing critical infrastructure work that prevents outages produces no visible features. An engineer investigating a complex bug for a week might find it was a one-line fix.

Most metrics can't capture this context. Leaders see the numbers, not the story.

3. Measurement Usually Precedes Punishment

In too many organizations, the sequence goes: introduce metrics, identify "low performers," initiate PIPs. Engineers have seen this pattern enough times to recognize it. When you say "we're measuring productivity," they hear "we're building a case to fire people."

So Should We Just... Not Measure?

No. And engineers know this too, even if they won't say it publicly.

The absence of measurement creates its own problems:

Invisibility. Without data, promotions go to the most visible engineers, not the most impactful ones. The engineer quietly maintaining critical infrastructure gets overlooked while the one presenting at all-hands gets promoted.
Bias. Without objective data, performance evaluation is based on manager perception. Recency bias, similarity bias, proximity bias for remote vs. in-office.
No early warning. You don't know a team is struggling until it's obvious. By then, engineers are burned out and projects are behind.
AI opacity. As AI tools transform workflows, leaders need to understand what's changing. Without measurement, you can't tell if your AI investment is working.

The answer isn't no measurement. It's measurement that doesn't suck.

What Better Measurement Looks Like

Measures output, not activity. Don't track hours online or Slack messages. Measure what shipped. The only artifact that matters is code that reaches production.
Objective and consistent. The same work gets the same score regardless of who did it or what their manager thinks. No bias. No mood. No politics. No recency bias.
Transparent. Engineers see exactly how their score is calculated and why a specific PR scored what it did. Black-box metrics breed distrust.
Gaming-resistant. Grounded in the actual artifact — the code itself — not self-reported estimates or easily manipulated proxies.
Captures complexity, not volume. A brilliant 30-line security fix should score higher than a 500-line copy-paste integration.

From Skepticism to Competition

When we rolled out GitVelocity internally at Headline, the reaction was exactly what you'd expect.

Phase 1: Skepticism. "Can AI really judge my code?" Initial resistance was natural and expected.

Phase 2: Testing. Engineers started checking their scores. They'd look at a PR they were proud of and see if the system recognized the complexity. They'd look at a simple fix and see if it scored low. They were validating.

Phase 3: Acceptance. The scores consistently made sense. Trust built gradually.

Phase 4: Competition. This one surprised us. Engineers started wanting to improve their scores. Weekly meetings began celebrating top performers. People put their numbers in and competed. Natural gamification emerged — not because we designed it, but because fair measurement creates healthy motivation.

Something else happened we didn't anticipate: some junior engineers started outperforming seniors on pure velocity metrics. AI tools had unlocked them. They were shipping complex work that would have been beyond their reach a year ago. Without objective measurement, we never would have seen it.

Having the Conversation

If you're introducing measurement to your team, how you communicate matters as much as the tool.

What to say:

"This measures the complexity of shipped code, not your performance as a person."
"Every score is transparent — you can see exactly why each PR scored what it did."
"Simple PRs scoring low is normal. Not every PR needs to be complex."
"We're using this to understand capacity and identify where people might need support."

What NOT to say:

"We're tracking individual productivity." Frame it as team visibility.
"Low scores mean low performance." They don't. They mean simple work.
"We'll be comparing engineers against each other." Use velocity for trends, not rankings.

What to actually do:

Share your own scores if you commit code. Lead by example.
Use the data to celebrate wins, not to punish.
When velocity drops, ask "what's blocking you?" not "why aren't you performing?"
Give the system time. Trust took time at Headline — but it stuck.

The Trust Equation

Engineers will accept measurement when the metric is fair, the data is transparent, and leadership uses it constructively. That's a high bar. Most measurement systems fail on at least one of these criteria.

But it's not an impossible bar. It just requires intentional design — of the metric itself, of how it's presented, and of how leadership responds to the data.

The goal isn't to make engineers love being measured. It's to make their work visible, their contributions recognized, and their careers more objective. That's worth building.

GitVelocity measures engineering velocity by scoring every merged PR using AI. Every score is transparent and explainable.

See how it works.