· 5 min read · Engineering Measurement

Why Developer Productivity Measurement Is Now a Board-Level Conversation

Engineering went from 5% of headcount to 30-50%. Boards want the same visibility they get for sales. The tools to deliver it finally exist.

When I was an engineer, nobody in finance could tell you what the engineering team actually produced last quarter. They could tell you headcount. They could tell you burn rate. They could tell you how many Jira tickets moved to "Done." But what did the company get for its $4M quarterly engineering spend? Shrugs all around.

This used to be acceptable. When engineering was a small support function — five people in a corner writing internal tools — the black box was tolerable. The dollar amounts were small enough that nobody demanded rigorous measurement.

That era is over.

The Math Changed

At most technology companies, engineering is now 30-50% of total headcount. At startups, it's often higher. When you're spending millions per quarter on a single function, "trust us, we're building stuff" stops being a satisfying answer to the board.

Think about how every other function operates. Sales has pipeline, close rates, and revenue per rep. Marketing has CAC, attribution models, and conversion funnels. Finance has... well, finance literally is measurement. Customer success has NRR, CSAT, and churn analysis.

Engineering has story points.

I've sat in dozens of board meetings at Headline, where I work as an investor. The pattern is always the same. The CRO walks through a crisp revenue forecast with clear leading indicators. The CMO shows pipeline attribution by channel. Then the CTO gets up and talks about "velocity" in terms that nobody outside the engineering org can connect to business outcomes.

It's not the CTO's fault. They've never had the tools to do better.

Why This Problem Persisted So Long

Engineering measurement is genuinely hard. I wrote about this in detail in Engineering Measurement Is Broken, but the short version is: software development has properties that resist simple quantification.

A one-line config change and a database migration rewrite both show up as "one PR." An engineer who deletes 300 lines of dead code looks less productive than one who copy-pastes 300 lines of boilerplate. The engineer who spends three days preventing a production outage has nothing to show in any dashboard.

Previous generations of measurement tools tried to solve this with proxies — commit counts, lines of code, story points, cycle time. Each proxy captured a sliver of truth while missing the full picture. Worse, most of them were trivially gameable. Engineers figured out the incentives within a week and optimized for the metric instead of the outcome.

So leadership gave up. They accepted the black box because the alternative — measurement that actively distorted behavior — was worse than no measurement at all.

What Changed: AI Can Read Code

The reason this problem is solvable now, in 2026, and wasn't solvable in 2020 is simple: large language models can understand code.

Not in the way a linter understands code — checking for syntax violations and style rules. I mean genuinely understand what a code change does, how architecturally significant it is, how much risk it introduces, and how much skill it required.

When you can evaluate the actual substance of engineering work, you don't need proxies anymore. You don't need to count commits or measure cycle time or estimate story points. You can look at what was actually built and assess its complexity directly.

This is the shift. Not from "unmeasured" to "measured with bad metrics" — we already tried that and it failed. The shift is from proxy metrics to direct evaluation of the work itself.

This Is Not Surveillance

I need to address the elephant in the room, because engineers justifiably hate being measured.

Every previous attempt at engineering measurement felt like surveillance because it was. Keystroke logging. Screenshot monitoring. Commit frequency tracking. These tools treated engineers like factory workers on an assembly line, measuring inputs (time, keystrokes, activity) rather than outputs (working software).

Measuring output is fundamentally different from monitoring activity. Nobody cares when you work, how many hours you're online, or how many keystrokes you logged. What matters is: when you merge a PR, how complex and valuable was the work?

The analogy I use: sales doesn't monitor how many hours reps spend on the phone. They measure revenue closed. Marketing doesn't track how many emails were drafted. They measure pipeline generated. Measuring engineering output with the same rigor isn't surveillance — it's treating engineering like the critical business function it has become.

At Headline, when we started measuring output across our portfolio companies, the reaction from engineers surprised us. Most of them wanted visibility into their own work. The high performers especially. They'd been watching mediocre colleagues coast for years with no accountability, and they were tired of it.

The CFO Question

Here's the conversation that's happening in boardrooms right now, and it's going to force the issue whether engineering leaders are ready or not.

CFOs are asking: "We spend $15M a year on engineering. What's the return?"

In a zero-interest-rate environment, you could dodge this question. Capital was cheap, growth at all costs was the strategy, and nobody scrutinized engineering spend too closely. That environment is gone.

Today, every dollar spent on engineering competes with every other dollar in the business. If marketing can prove $5 of pipeline for every $1 spent, and engineering can't articulate its return at all, guess which budget gets cut when times get tight.

This isn't a hypothetical. I've watched it happen at portfolio companies. Engineering teams that couldn't demonstrate impact got flattened while the teams that had data kept their headcount.

The engineering leaders who get ahead of this — who proactively build measurement systems before the board demands them — will have a massive advantage. They'll be able to advocate for their teams with data, justify hiring plans with output trends, and identify where investment generates the highest return.

What Good Measurement Looks Like

Good engineering measurement has a few properties:

It evaluates output, not activity. Nobody cares how many hours you worked. The question is what you built and how complex it was. This is the fundamental shift from vanity metrics to substance.

It's gaming-resistant. If engineers can inflate their scores by splitting PRs or padding commits, the metric is worse than useless. The measurement system needs to evaluate the actual code change, not the ceremony around it.

It works at the individual level. Team averages hide everything interesting. You need to see individual contributions to understand skill gaps, identify top performers, and have meaningful development conversations.

It's consistent over time. You need to be able to compare Q1 to Q3 and know the yardstick didn't change. This is where AI-powered scoring has a massive advantage over human estimation — the model applies the same criteria every time.

Engineers can see their own scores. Transparency is non-negotiable. If engineers can't see and understand how they're being evaluated, you'll get resistance instead of buy-in. When we made scores visible at Headline, engineers started competing to improve rather than resenting the system.

The Window Is Now

We're at an inflection point. The tools exist to measure engineering output properly for the first time. The business pressure to do so is intensifying. And the companies that figure this out first will have a structural advantage in how they allocate engineering investment, retain top talent, and communicate engineering value to the business.

The black box era is ending. The question is whether you'll be leading the transition or reacting to it.

GitVelocity measures engineering velocity by scoring every merged PR using AI. It gives engineering leaders the visibility they need to treat engineering investment with the same rigor as every other business function.

See how it works.

Conrad Chu
Written by Conrad Chu

Conrad is CTO and Partner at Headline, where he leads data-driven investment across early stage and growth funds with over $4B in AUM. Before becoming an investor, he founded Munchery (raised $130M+) and held engineering and product leadership roles at IAC and Convio (IPO 2010). He and the Headline engineering team built GitVelocity to help engineering organizations roll out agentic coding and measure its impact.