The DORA Industrial Complex
DORA metrics started as research. They became a religion. Somewhere along the way, we stopped asking whether our teams are shipping great software and started asking whether our pipelines look fast enough on a dashboard.
I need to say something that might lose me friends at conferences: DORA metrics have done more harm than good to engineering culture.
Not the research. The research was excellent. Nicole Forsgren, Jez Humble, and Gene Kim conducted a rigorous, multi-year study of 30,000+ professionals and found meaningful correlations between certain delivery practices and organizational performance. That's valuable work. I respect it.
What I don't respect is what the industry did with it.
How Research Became Religion
The original DORA research made a careful, specific claim: high-performing organizations tend to exhibit high deployment frequency, low lead time, low change failure rate, and fast recovery times. The operative word is tend. This was a population-level observation across thousands of companies in wildly different contexts.
Here's what happened next: the entire engineering leadership ecosystem took a correlational finding and reversed the causal arrow. "High-performing teams have these metrics" became "improve these metrics and you'll become high-performing." That's not what the research said. That's not how causation works. And it's led us somewhere genuinely bad.
An entire industry grew around this inversion. DORA dashboards. DORA consultants. DORA certifications. DORA benchmarks that teams agonize over at quarterly reviews. I've sat in board meetings where a CTO presented their DORA scores like a report card, and nobody in the room asked the obvious follow-up: "What did you actually ship?"
That's the DORA Industrial Complex. And it's everywhere.
Four Metrics About the Pipe. Zero About the Water.
Let's be direct about what DORA actually measures:
- Deployment frequency: How often does code reach production?
- Lead time for changes: How quickly does a commit get deployed?
- Change failure rate: What percentage of deployments break something?
- Mean time to recovery: How fast do you recover from incidents?
These are all about the delivery pipeline. Every single one. They measure the container, not the contents. The conveyor belt, not what's on it.
You know what DORA doesn't tell you?
- Whether the code that deployed was any good
- Whether it solved a real problem
- Whether it was trivial boilerplate or a sophisticated architectural improvement
- Whether your team shipped more or less complex work this quarter
- Whether your best engineer is burning out on low-impact work
- Whether your AI tools are actually making people more productive
A team can achieve elite DORA status while shipping nothing but config changes and copy updates. I've seen it. They deploy 30 times a day, their lead time is under an hour, their failure rate is near zero, and their MTTR is measured in minutes. Their DORA dashboard glows green. Their product hasn't meaningfully improved in months.
Meanwhile, a team doing hard, ambitious work — rewriting a core subsystem, building a new data pipeline, tackling genuine technical debt — might deploy less often, with longer lead times and higher failure rates, because that's what complex work looks like. DORA would flag them as underperforming. In reality, they're the ones moving the product forward.
The Goodhart's Law Trap (And DORA Walked Right Into It)
"When a measure becomes a target, it ceases to be a good measure." — Charles Goodhart
Every article about DORA, including the well-intentioned CodePulse guides and conference talks, dutifully recites Goodhart's Law and then proceeds to give you benchmark targets anyway. "Elite teams deploy multiple times per day." "Lead time under one hour." "Change failure rate under 5%."
They'll tell you these aren't targets. Then they'll put them in a dashboard with red/yellow/green indicators. They'll tell you not to optimize for the metrics. Then they'll build an entire assessment framework around improving your scores. The cognitive dissonance is remarkable.
Here's the uncomfortable truth: DORA metrics are inherently Goodhart-susceptible because they measure process mechanics that can be optimized independently of outcomes. You can improve every DORA metric without shipping better software. And teams do. Constantly.
Want to boost deployment frequency? Deploy smaller. Break every PR into micro-changes. Ship config tweaks and one-liners. Your frequency goes up, your dashboard improves, and the total complexity of your output stays flat or drops.
Want to reduce lead time? Skip thorough code review. Reduce test coverage. Remove manual approval gates. Your lead time plummets. Your code quality follows.
Want to lower change failure rate? Only ship safe, incremental, low-risk changes. Never take on the risky refactor. Never do the migration that might fail. Your failure rate drops. Your technical debt compounds.
The DORA community knows this. They acknowledge it in every guide and every talk. And then they keep building tools and frameworks that create exactly these incentive structures. Because the alternative — admitting that process metrics alone are insufficient — threatens the entire ecosystem that depends on them.
The Consulting-Industrial Complex
I want to be blunt about something: DORA has become a business. A big one.
There are companies whose entire revenue model is selling DORA dashboards, DORA assessments, and DORA improvement programs. There are consultants who charge six figures to help your org "achieve elite DORA status." There are conference talks, certification programs, and multi-day workshops all centered on four metrics about pipeline speed.
None of these businesses have an incentive to tell you that DORA is insufficient. Their livelihoods depend on it being the central framework. So they'll acknowledge the limitations in a footnote, recite Goodhart's Law as a ritual incantation, and then sell you another dashboard.
I'm not questioning anyone's sincerity. Most people in this space genuinely believe they're helping. But incentive structures matter. And the incentive structure of the DORA ecosystem pushes toward treating process metrics as the whole picture, because that's what the products and services are built to measure.
What We Forgot to Measure
All of this process optimization has distracted us from the question that actually matters: what did your team build?
Not how fast the pipeline ran. Not how many times you deployed. Not your incident response time. Those are all fine things to know. But they're ancillary to the core question of engineering productivity.
When your CEO asks "how productive is our engineering team?", they're not asking about deployment frequency. They're asking: are we building things that matter? Is our team's output increasing or decreasing? Are we getting the most out of our engineering investment?
DORA can't answer any of that. And because DORA became the dominant framework, many organizations never built the capability to answer it at all. They spent five years perfecting their pipeline metrics and never developed a way to measure what's flowing through the pipeline.
That's the real damage. It's not that DORA metrics are wrong. It's that they've crowded out the conversation about what actually matters.
The AI Era Exposes the Gap
This problem is about to get much worse.
An engineer using AI tools effectively might ship fewer deployments but dramatically more complex work. Their deployment frequency goes down. Their lead time goes up (complex PRs need more review). By DORA's lens, they look like they're regressing.
Meanwhile, their actual output — the complexity, quality, and volume of code that ships to production — might have doubled. An engineer who used to ship one feature a week now ships three, each more sophisticated than what they could produce manually.
If you're evaluating your AI tool investment using DORA metrics, you're going to conclude it's not working. The pipeline didn't speed up. It might have slowed down. But the output flowing through it is fundamentally different. DORA is blind to this.
In the AI era, process speed matters less than ever. What matters is the substance and complexity of what your team is producing. And we have no mainstream framework for measuring that — because DORA ate the entire conversation.
What GitVelocity Does Differently
This is where I should be transparent about our angle. We built GitVelocity because we hit this exact wall.
At Headline, we had solid DORA metrics. Our pipeline was fast. Our deployments were frequent. And we still couldn't answer the most basic question: is our engineering team becoming more productive?
So we built a system that measures the thing DORA ignores: what actually shipped. Every merged PR gets scored by AI across six dimensions — Scope, Architecture, Implementation, Risk, Quality, and Performance & Security — producing a complexity score from 0 to 100.
This isn't a replacement for DORA. (Though honestly, for most teams, it's far more useful.) It's the metric that should have existed alongside DORA from the beginning. DORA tells you the pipeline is healthy. GitVelocity tells you the work flowing through it is complex, valuable, and increasing over time.
The difference in practice is profound. We can now see when an engineer adopts AI tools and their output complexity doubles. We can see when a team is running a fast pipeline but shipping trivial work. We can see when ambitious, risky work is paying off even though it temporarily worsened cycle time. None of that was visible before.
What I'd Say to the DORA Faithful
If you're invested in DORA — and statistically, you probably are — I'm not asking you to throw it away. Keep your deployment frequency tracking. Keep your MTTR dashboards. They're useful signals about process health.
But please stop pretending they measure productivity. They don't.
Please stop treating "elite DORA status" as the goal. It's a side effect of good practices, not a cause of good outcomes.
Please stop building your entire engineering performance narrative around four metrics that tell you nothing about the value of what your team built.
And please, especially in the AI era, start asking the question that DORA was never designed to answer: what did we actually ship, and was it any good?
The code is the code. It doesn't care how fast your pipeline is. It tells the truth about what was actually done.
GitVelocity measures engineering output by scoring every merged PR using AI — the metric DORA forgot. See how it works.
Conrad is CTO and Partner at Headline, where he leads data-driven investment across early stage and growth funds with over $4B in AUM. Before becoming an investor, he founded Munchery (raised $130M+) and held engineering and product leadership roles at IAC and Convio (IPO 2010). He and the Headline engineering team built GitVelocity to help engineering organizations roll out agentic coding and measure its impact.