· 7 min read · Engineering Measurement

The Software Development KPIs Worth Tracking in 2026

Most engineering KPIs measure activity, not outcomes. Here are the KPIs that actually tell you something useful about your team's productivity.

Most engineering dashboards are full of numbers that go up and to the right. Commits per week. PRs merged. Deployment frequency. The charts look healthy. Leadership feels informed.

But ask a follow-up question — "Is the team shipping more valuable work?" — and the dashboard goes silent.

The problem isn't a lack of data. It's that most KPIs measure activity instead of outcomes. They tell you the team is busy. They don't tell you the team is productive. Those are very different things, and the gap between them is growing wider as AI tools change how engineering work gets done.

Here are 8 KPIs that actually tell you something useful. Not all of them are easy to measure. That's the point — the easy metrics are the ones everyone already tracks, and they're the ones that matter least.

1. Shipped Code Complexity

What it measures: The actual complexity of code that reaches production, scored on a consistent scale.

Why it matters: This is the closest thing we have to measuring engineering output directly. Not the activity that produced it, not the process it went through, not someone's pre-work estimate of how hard it would be — the actual artifact.

GitVelocity scores every merged PR on a 0-100 scale across six dimensions: Scope, Architecture, Implementation, Risk, Quality, and Performance/Security. The scoring rubric is fully transparent — engineers can see exactly why they got the score they got.

How to track it: GitVelocity runs AI analysis on every PR that merges to your main branch. You get individual scores per PR, aggregate velocity per engineer, and team-level trends over time.

What good looks like: Steady or increasing aggregate complexity scores over time, without a corresponding increase in change failure rate. If scores are rising but quality is stable, your team is shipping more substantial work.

2. Deployment Frequency

What it measures: How often your team deploys code to production.

Why it matters: Deployment frequency is one of the four DORA metrics, and it's the one that correlates most directly with engineering team health. Teams that deploy frequently tend to have smaller changesets, faster feedback loops, and lower risk per deployment.

How to track it: Pull from your CI/CD pipeline. Most tools (GitHub Actions, GitLab CI, CircleCI) can report deployment events. Track deployments per day or per week, broken down by team.

What good looks like: Elite teams deploy on demand, often multiple times per day. Low performers deploy weekly or monthly. But frequency alone isn't enough — you need to pair it with change failure rate to make sure you're not just deploying broken code faster.

The caveat: Deployment frequency is a process metric, not an output metric. A team deploying empty features ten times a day looks great on this KPI. Pair it with shipped code complexity to get the full picture.

3. AI Adoption Rate

What it measures: How effectively your team is adopting AI coding tools — not just whether they have licenses, but whether AI adoption is translating to measurable output changes.

Why it matters: This is the new KPI that didn't exist 18 months ago but is now critical. Most engineering orgs have invested in AI tools — Cursor, Claude Code, GitHub Copilot. Very few know whether that investment is paying off. Some engineers have doubled their output. Others are using AI to produce the same work slightly faster. A few aren't using it at all.

How to track it: Compare velocity trends before and after AI tool adoption. GitVelocity's individual velocity scores make this visible — you can see if an engineer's shipped complexity increased after they started using agentic coding tools.

What good looks like: Rising complexity scores per engineer with stable or improving quality metrics. If an engineer ships 40% more complex work per sprint after adopting Claude Code, and their change failure rate doesn't spike, that's real ROI.

What bad looks like: AI tool licenses deployed, no measurable change in output. This usually means engineers need better training, or the tools aren't well-integrated into their workflow.

4. Change Failure Rate

What it measures: The percentage of deployments that cause a failure in production — rollbacks, hotfixes, or incidents.

Why it matters: This is the quality check on everything else. Fast deployment frequency is useless if 20% of deploys cause fires. High complexity scores mean less if the complex code keeps breaking. Change failure rate keeps the other metrics honest.

How to track it: Track incidents or rollbacks per deployment. Most incident management tools (PagerDuty, Opsgenie, Rootly) can report this. The DORA framework considers <5% to be elite performance.

What good looks like: Under 5% change failure rate, sustained over time. If your team deploys daily and less than 1 in 20 deployments causes an issue, your quality practices are working.

Watch for: A sudden drop in change failure rate after introducing code complexity measurement. When engineers know their work is being evaluated on quality and risk dimensions — not just volume — they tend to write more careful code.

5. Individual Velocity Trends

What it measures: How each engineer's shipped output changes over time — not their absolute score, but the trajectory.

Why it matters: Absolute comparisons between engineers are fraught. A backend engineer working on infrastructure will have different complexity profiles than a frontend engineer building UI features. But trends are revealing. Is this engineer shipping more substantial work month over month? Did their output drop after a team reorg? Did it spike after they started using a new AI tool?

How to track it: GitVelocity tracks individual velocity over time and surfaces trends automatically. Look at 4-week rolling averages to smooth out the natural variation from sprint to sprint.

What good looks like: Steady upward trends for engineers who are growing in capability. Stable output for experienced engineers working on consistent projects. The red flag is a downward trend with no obvious explanation — that's a signal worth investigating in a 1:1, not a performance review.

Important: Individual velocity trends are great for coaching — and when the scoring is gaming-resistant, leaderboards can create healthy competition. GitVelocity includes a leaderboard because you can't inflate a complexity score without shipping genuinely complex code. That changes the dynamic entirely.

6. Review Turnaround Time

What it measures: How long PRs wait for review after being opened.

Why it matters: Review turnaround is one of the biggest hidden taxes on engineering velocity. A PR that waits 24 hours for review is a context switch waiting to happen. The engineer moves on to other work, then has to context-switch back when review comments arrive. Multiply this across a team and you lose days of productivity per sprint to waiting.

How to track it: Pull from your git provider's API. Measure the time from PR opened to first review, and from PR opened to merge. Break it down by team and reviewer.

What good looks like: First review within 4 hours during working hours. PRs merging within 24 hours of opening. If reviews consistently take more than a day, you have a bottleneck that's costing you more velocity than most feature work would add.

How to fix it: Set team working agreements for review turnaround. Many teams use "review before lunch" or "review within 4 hours" norms. Tools like Swarmia and LinearB can track and surface this automatically.

7. PR Size Distribution

What it measures: The distribution of PR sizes across your team — how many PRs are small, medium, and large.

Why it matters: PR size is one of the strongest predictors of review quality and deployment risk. Large PRs get rubber-stamped — reviewers skim instead of reading carefully. They're also harder to roll back when something goes wrong. Small, focused PRs get better reviews, merge faster, and fail more gracefully.

How to track it: Most git analytics tools can report PR size distributions. Track the percentage of PRs under 400 lines changed, and flag PRs over 1,000 lines for review.

What good looks like: 80%+ of PRs under 400 lines. A long tail of larger PRs is normal — some work can't be split effectively — but if the median PR is over 500 lines, your team is probably batching too much work together.

The nuance: AI-generated code is changing this calculus. An engineer using Claude Code might produce a 600-line PR that's as coherent as a hand-written 200-line one, because the AI generates complete implementations rather than incremental fragments. Consider PR complexity scores alongside raw size.

8. Team Velocity Growth

What it measures: The aggregate shipped complexity of the entire team over time.

Why it matters: This is the executive-level KPI. Individual trends are for coaching. Team velocity growth answers the question leadership actually asks: "Is our engineering team getting more productive?"

How to track it: Sum or average the complexity scores across all PRs merged by the team per sprint or per month. Track the trend line. GitVelocity provides this as a default dashboard view.

What good looks like: Steady upward trend, especially after introducing AI tools, hiring new engineers, or removing process bottlenecks. A flat line after a major investment (new tools, new hires) is a red flag that the investment isn't translating to output.

What to watch for: Velocity growth that comes entirely from one or two high-performing engineers. Healthy teams show broadly distributed growth, not dependence on individual heroes.

The KPIs That Didn't Make the List

Notice what's missing: lines of code, commit count, hours logged, story points, and tickets completed. These are the most commonly tracked engineering metrics, and they're the least useful. They measure inputs and proxies, not outcomes.

I've written about why these metrics measure nothing and why story points specifically failed elsewhere. The short version: if a metric can be gamed by changing git habits or inflating estimates, it's not measuring anything real.

Putting It Together

No single KPI tells the whole story. The 8 metrics above work together:

  • Shipped Code Complexity tells you what was produced
  • Deployment Frequency + Change Failure Rate tell you how safely it was delivered
  • AI Adoption Rate tells you whether your tool investments are working
  • Individual Velocity Trends enable coaching conversations
  • Review Turnaround + PR Size identify process bottlenecks
  • Team Velocity Growth answers the executive question

Start with the ones you can measure today. For most teams, that's deployment frequency and change failure rate (you probably already have this data in your CI/CD pipeline) plus shipped code complexity (which GitVelocity can provide from day one with zero configuration beyond connecting your repos).

The metrics you track shape the incentives your team responds to. Track activity and you'll get busy engineers. Track output and you'll get productive ones.


GitVelocity measures engineering velocity by scoring every merged PR using AI. Track shipped code complexity, individual trends, and team growth from day one.

See how it works.

Conrad Chu
Written by Conrad Chu

Conrad is CTO and Partner at Headline, where he leads data-driven investment across early stage and growth funds with over $4B in AUM. Before becoming an investor, he founded Munchery (raised $130M+) and held engineering and product leadership roles at IAC and Convio (IPO 2010). He and the Headline engineering team built GitVelocity to help engineering organizations roll out agentic coding and measure its impact.