· 6 min read · Engineering Measurement

DORA Metrics Are Necessary But Not Sufficient

DORA metrics measure how fast your pipeline runs. They don't measure what's moving through it. Here's what's missing — and why it matters.

Let me start by saying something unpopular in the "metrics are broken" discourse: DORA metrics are good. They represent genuine progress in how we think about engineering team health. The research behind them is solid. The four key metrics — deployment frequency, lead time for changes, change failure rate, and time to restore service — capture something real about software delivery performance.

I'm not here to tear DORA down. I'm here to explain why it's not enough.

What DORA Actually Measures

The DORA framework, born from the Accelerate research by Nicole Forsgren, Jez Humble, and Gene Kim, measures delivery process health. It answers the question: "How effectively does our pipeline turn code into running software?"

Deployment frequency — How often does your team deploy to production? High-performing teams deploy multiple times per day.

Lead time for changes — How long from commit to production? Elite teams do it in less than an hour.

Change failure rate — What percentage of deployments cause failures? Best teams stay under 5%.

Time to restore service (MTTR) — When failures happen, how quickly do you recover? Top teams recover in under an hour.

These are genuinely useful. A team with slow deployments, long lead times, high failure rates, and slow recovery has real problems. DORA surfaces those problems. If your team scores poorly on DORA, fix that first.

But here's the gap that nobody talks about enough.

The Question DORA Can't Answer

Imagine two engineering teams at the same company. Both deploy multiple times per day. Both have lead times under an hour. Both have change failure rates under 5%. Both recover from incidents within minutes.

By DORA standards, both are elite.

Now look at what they're actually shipping:

Team A deploys 15 times this week. Most deployments are small configuration changes, copy updates, and minor bug fixes. One deployment is a meaningful feature. Total engineering complexity shipped: moderate.

Team B deploys 8 times this week. Every deployment includes significant feature work — new API endpoints, complex business logic, architectural improvements. Total engineering complexity shipped: high.

DORA says Team A is performing better (higher deployment frequency). Reality says Team B is delivering more value. DORA measures the speed of the conveyor belt. It doesn't measure what's on it.

This isn't a flaw in DORA — it's a scope limitation. DORA was designed to measure process, not output. The problem is when organizations treat it as a complete picture of engineering productivity.

Cycle Time: Same Problem, Different Angle

Cycle time — the duration from work starting to work shipping — is another popular metric. Like DORA, it measures process speed. And like DORA, it's blind to what's being processed.

A cycle time of 2 hours looks great. But was that 2 hours for a typo fix or for a complex database migration? The metric treats them identically. A team that ships trivial changes fast will have better cycle time than one shipping complex, carefully reviewed architectural work.

Worse, optimizing for cycle time creates a perverse incentive: ship simpler things. Complex PRs require more review, more testing, more consideration. They naturally have longer cycle times. If you're measured on cycle time, the rational move is to avoid complex work or split it into pieces so small they lose their architectural coherence.

I've seen this happen. Teams obsessively tracked cycle time, celebrated when it dropped, and didn't notice that the complexity of their shipped work was declining in lockstep.

MTTR: Useful for Operations, Useless for Productivity

Mean Time to Recovery is a critical operational metric. You absolutely want to know how quickly your team can respond to incidents. But it tells you nothing about day-to-day engineering productivity.

A team that ships no features and has no incidents has an undefined MTTR. A team that ships transformative features and occasionally breaks things has a measurable (and potentially concerning) MTTR. Which team is more productive?

MTTR also creates an incentive problem: the safest way to improve MTTR is to reduce blast radius, which often means shipping smaller, safer changes. That's not always wrong — but it can become a reason to avoid the high-risk, high-reward work that actually moves products forward.

The Goodhart's Law Problem

"When a measure becomes a target, it ceases to be a good measure."

This hits DORA metrics particularly hard because they're so actionable. Want to improve deployment frequency? Remove manual gates, automate everything, deploy smaller. Those are all good practices. But if you're deploying more often without increasing the value of what you're deploying, you've optimized the metric without improving the outcome.

I've talked to CTOs who proudly report "we deploy 50 times a day" and can't tell me what their team shipped last month in terms of actual features or complexity. The pipeline is fast. What's flowing through it is invisible.

What's Missing: A Measure of What Shipped

DORA tells you how fast and how reliably you ship. It doesn't tell you what you shipped or how complex it was.

What's missing is output measurement — a way to evaluate the engineering complexity of the work that actually reaches production. Not the speed of the process, but the substance of the result.

This requires evaluating the actual artifact: the code that merged. Not how quickly it was deployed. Not how many steps the pipeline has. The code itself — what changed, how complex those changes were, and what engineering effort they represent.

That's what we built GitVelocity to do. Every merged PR gets scored across six dimensions — Scope, Architecture, Implementation, Risk, Quality, and Performance & Security — producing a 0-100 complexity score. This isn't a replacement for DORA. It's the missing complement.

Think of it this way:

Metric What it measures What it tells you
DORA Process health How well your pipeline works
Cycle time Process speed How fast work flows through
GitVelocity Output complexity What actually shipped and how complex it was

You need both. A team with great DORA metrics and low velocity is running an efficient pipeline that ships trivial work. A team with high velocity and poor DORA metrics is doing complex work but can't deliver it reliably. The goal is both: a healthy process that ships complex, valuable work consistently.

The AI Era Makes This Worse

AI tools amplify the gap between process metrics and output measurement.

An engineer using Claude Code might ship fewer, larger changes. Their deployment frequency could actually decrease while their output dramatically increases. Their lead time might increase because complex PRs take longer to review. By DORA standards, their adoption of AI tools looks like regression.

Meanwhile, their actual engineering output — the complexity and volume of shipped code — might have doubled.

If you're using DORA to evaluate your AI tool investment, you're looking at the wrong dashboard. AI adoption shows up in output, not in process speed. You need a way to measure what shipped, not just how fast the pipeline ran.

Using DORA and Output Measurement Together

Here's the practical framework:

Use DORA to answer: Is our engineering process healthy? Are we deploying frequently? Are we recovering quickly? Is our failure rate acceptable?

Use output measurement to answer: What did our team actually ship? Is the complexity of our work increasing? Are individual engineers growing? Is our AI investment paying off?

Use both together to answer: Are we shipping complex, valuable work through a reliable, fast pipeline?

A team with high DORA scores and rising velocity is operating at full capacity. A team with high DORA scores and flat velocity has a fast pipeline carrying the same amount of work — they need to invest in capability, not process. A team with low DORA scores and high velocity has productive engineers bottlenecked by a slow pipeline — fix the process.

Neither metric alone gives you this picture. Together, they're comprehensive.

The Takeaway

DORA metrics earned their place. They're the best thing that happened to engineering process measurement. Keep tracking them. Keep improving them.

But stop pretending they measure productivity. They measure the health of the system through which productivity flows. The productivity itself — the complexity and volume of work that actually ships — requires a different lens.

The code is the code. DORA can tell you how fast it deployed. Only output measurement can tell you what it was.


GitVelocity measures engineering velocity by scoring every merged PR using AI. It complements process metrics like DORA by measuring what actually shipped.

See how it works.

Conrad Chu
Written by Conrad Chu

Conrad is CTO and Partner at Headline, where he leads data-driven investment across early stage and growth funds with over $4B in AUM. Before becoming an investor, he founded Munchery (raised $130M+) and held engineering and product leadership roles at IAC and Convio (IPO 2010). He and the Headline engineering team built GitVelocity to help engineering organizations roll out agentic coding and measure its impact.