· 8 min read · Ai Measurement

Is Your AI Investment Working? The Engineering Metrics That Tell You

A practical metrics framework for tracking AI tool adoption across your engineering org. Lagging indicators, leading indicators, and the red flags to watch for.

You've rolled out AI coding tools to your engineering team. Cursor, Claude Code, Copilot — maybe all three. Now what? How do you know adoption is actually happening? How do you tell the difference between an engineer who's using AI to ship twice as much and one who opened the tool once and went back to their old workflow?

Most engineering managers don't have a metrics framework for this. They have anecdotes. They have gut feelings. They have the occasional Slack message that says "Claude Code just saved me two hours."

That's not enough. Here's a practical metrics playbook for tracking AI adoption across your engineering org — what to measure, what the signals mean, and what to do about the patterns you'll find.

Lagging Indicators: Velocity Score Changes

Lagging indicators tell you whether AI adoption is actually producing results. They don't tell you it's happening in real time — they confirm it after the fact. But they're the most reliable signals you have.

Per-Engineer Velocity Trends

This is the single most important metric. Track the total complexity of shipped code per engineer per week, measured by scoring every merged PR.

What to watch for:

Upward trend over 4-8 weeks. When an engineer starts using AI tools effectively, velocity typically stays flat for 1-2 weeks (learning period), then increases 20-40% over the following 2-4 weeks, then stabilizes at a new baseline. This is the healthy adoption curve.

Sudden spike followed by return to baseline. This usually means the engineer tried AI tools for a specific project and then stopped. It's not sustained adoption — it's experimentation. Worth noting, but don't count it as adoption.

Flat line despite tool access. The engineer has the license but isn't using it effectively. This is a coaching signal, not a performance signal.

Track these trends at the individual level. Team aggregates hide the variation that matters.

PR Complexity Distribution

Before AI adoption, most teams have a consistent distribution of PR complexity. Some PRs are trivial (config changes, dependency updates), most are moderate, and a few are complex architectural changes.

After effective AI adoption, this distribution shifts. Watch for:

The middle moving up. Moderate PRs become higher complexity. Engineers are shipping more ambitious changes per PR because AI handles the implementation details.

More high-complexity PRs. Engineers attempt work they would have deferred or broken into smaller pieces. The total count of PRs scored above 50 (on a 0-100 scale) increases.

Fewer trivial PRs. Surprisingly, AI adoption often reduces the count of very low-complexity PRs. Not because engineers stop doing trivial work — but because they batch it into larger, more complete changes instead of shipping incremental commits.

Quality Dimension Stability

Velocity increases only count if quality doesn't decline. Track the Quality and Risk dimensions of your PR scores alongside velocity.

Healthy pattern: Velocity increases while Quality scores remain stable or improve. This means engineers are shipping more work without cutting corners.

Warning pattern: Velocity increases but Quality scores decline. Engineers might be accepting AI-generated code without sufficient review. This is common in the first few weeks of adoption and usually self-corrects, but it's worth watching.

Leading Indicators: Tool License Usage

Leading indicators predict adoption before velocity data confirms it. They're less reliable but faster.

License Activation Rate

Of the seats you've provisioned, how many are actively used? Track this weekly.

Target: 80%+ within 4 weeks of rollout. If you're below 60% after a month, you have an awareness or access problem. Some engineers might not know the tool is available, or setup friction is preventing adoption.

Don't confuse activation with effectiveness. An activated license means the engineer is trying. It doesn't mean they're succeeding. Activation is necessary but not sufficient.

Usage Frequency (If Available)

Some tools report daily or weekly active usage. This is marginally more useful than activation alone.

Daily users are probably integrating the tool into their workflow. That's the behavior that produces velocity increases.

Weekly or less suggests dabbling. The engineer is using it for specific tasks rather than as a core workflow tool. This might be appropriate for their work, or it might mean they haven't found the right use case yet.

But remember: usage frequency is still an input metric. An engineer who uses AI 10 minutes per day to plan their approach might get more value than one who uses it 6 hours per day to generate boilerplate. Don't over-index on frequency.

Quality Indicators: Change Failure Rate

AI tools introduce a specific quality risk: engineers shipping code they don't fully understand. Track these metrics to catch problems early.

Post-Merge Defect Rate

Are PRs from AI-adopting engineers producing more bugs? Track defects linked to PRs shipped after AI adoption vs. before.

Expected pattern: Post-merge defect rates should stay flat or decline. AI tools often improve quality by generating test coverage and catching edge cases. If defect rates increase, engineers need coaching on reviewing AI-generated code more carefully.

Test Coverage Changes

Watch test coverage trends alongside AI adoption.

Healthy pattern: Coverage increases. AI tools are excellent at generating tests, and engineers who use them tend to ship better-tested code.

Neutral pattern: Coverage stays flat. Engineers are maintaining their existing standards. Fine.

Concerning pattern: Coverage declines while velocity increases. Engineers are shipping more code faster but with less testing. This is a regression that AI tools should prevent, not cause. Investigate.

The Adoption Curve: Mapping Your Team

AI adoption follows a predictable curve across any team. Understanding where each engineer sits on this curve is essential for targeted support.

Enthusiasts (10-15% of team)

These engineers adopted AI tools immediately, often before the company officially rolled them out. They're already power users by the time you start tracking.

Identify by: Highest velocity increases in the first month. Often vocal advocates in team channels.

Action: Leverage them. Pair them with later adopters. Have them lead workshops. Their enthusiasm is contagious, but more importantly, they can show concrete workflows rather than abstract benefits.

Early Majority (30-40% of team)

These engineers are willing to try AI tools but need to see evidence that they work. They adopt within 4-8 weeks of rollout, often after watching the enthusiasts succeed.

Identify by: Velocity increases appearing 4-8 weeks after rollout. They often ask questions in team channels about specific use cases.

Action: Provide targeted support. These engineers respond well to documentation and use-case examples. "Here's how to use Cursor for [specific task they do regularly]" is more effective than general evangelism.

Late Majority (30-40% of team)

These engineers adopt when AI tools become the team norm rather than an experiment. They might need 3-6 months.

Identify by: Flat velocity for the first 2-3 months, then gradual increases. They often adopt after a team restructuring or process change makes AI tools the path of least resistance.

Action: Reduce friction. Make AI tools the default in your development environment. Integrate them into your onboarding. Make not using them the deviation rather than using them.

Resisters (5-15% of team)

These engineers don't adopt AI tools, or adopt them minimally. This isn't necessarily a problem.

Identify by: Flat velocity throughout the observation period despite having tool access.

Action: Understand why. Some engineers do work that genuinely doesn't benefit much from current AI tools (deeply specialized domains, hardware-adjacent code, complex debugging). Others have legitimate concerns about code quality or workflow disruption. A conversation is more useful than a mandate.

The Adoption Gap: Effective Use vs. Having a License

This is the metric most managers miss. The adoption gap is the difference between engineers who have AI tools and engineers who are getting measurable value from them.

How to calculate it:

Adoption_Gap = Engineers_With_Licenses - Engineers_With_Velocity_Increase

If you have 30 engineers with Cursor licenses and only 18 show meaningful velocity increases after 8 weeks, your adoption gap is 12 — 40% of your investment is underperforming.

This isn't about blaming the 12. It's about understanding why:

Setup friction. Is the tool hard to configure for your specific stack? Fix the onboarding.

Wrong tool for the work. Some engineers do work that Cursor handles well. Others don't. Maybe they need Claude Code for agentic workflows instead of IDE-based autocomplete.

Training gap. The engineer knows the tool exists but doesn't know how to use it effectively for their specific workflow. Pairing sessions with power users usually fix this quickly.

Genuine mismatch. The engineer's workflow genuinely doesn't benefit from AI tools right now. This is rarer than people think, but it's real.

Red Flags: Velocity Flat Despite Tool Access

This is the most important signal in your dashboard. If an engineer or team has had AI tool access for 8+ weeks and velocity hasn't increased at all, something is wrong.

Individual flat velocity: Usually a training or workflow integration problem. The engineer might be using the tool for the wrong tasks, or might not be using it at all despite having a license.

Team-wide flat velocity: More concerning. Could indicate tool-stack mismatch, insufficient training, or cultural resistance to AI adoption. Investigate at the team level before addressing individuals.

Velocity flat with increased activity metrics: The scariest pattern. The engineer is doing more work (more commits, more PRs) but the complexity hasn't changed. They're using AI to do the same work slightly differently, not to ship more valuable work. This is the AI equivalent of being busy without being productive.

What We Learned at Headline

We tracked all of these metrics internally as we adopted AI tools across our team. The biggest surprises:

Adoption was faster than expected for some, slower for others. The distribution was bimodal — engineers either adopted quickly (within 3 weeks) or took much longer (8+ weeks). Very few fell in between.

Junior engineers showed the largest velocity increases. AI tools compressed the implementation skill gap. Juniors who adopted aggressively could ship work that would have been above their level. This has implications for hiring, leveling, and team composition that most managers haven't fully processed.

The announcement effect was real. When we announced that GitVelocity scores would factor into annual performance reviews, January 2026 saw a 50% velocity jump. Measurement changes behavior. Make sure you're measuring the right thing.

Tool switching mattered. Some engineers didn't show velocity increases with their first AI tool but showed significant increases after switching to a different one. Cursor worked better for some workflows; Claude Code worked better for others. Don't assume one tool fits all.

Putting It Together: Your Weekly Dashboard

Here's what your AI adoption dashboard should show:

Team level:

  • Aggregate weekly velocity trend (8+ week view)
  • Velocity before vs. after AI tool rollout
  • Adoption gap: licenses vs. engineers with velocity increases

Individual level:

  • Per-engineer velocity trend line
  • Adoption curve segment (enthusiast / early majority / late majority / resister)
  • Quality dimension scores alongside velocity

Red flags:

  • Engineers with 8+ weeks of tool access and flat velocity
  • Quality score declines concurrent with velocity increases
  • High adoption gap (>30% of licensed engineers not showing impact)

Review this weekly. Act on the signals monthly. The data compounds — each month of tracking makes your adoption picture clearer and your interventions more targeted.

Measure What Matters

AI tool adoption isn't something you can mandate into existence. But it is something you can measure, understand, and accelerate.

The metrics framework is straightforward: track output complexity per engineer over time, segment your team by adoption curve, identify gaps, and act on red flags. Don't rely on surveys. Don't rely on tool vendor dashboards. Measure what your team actually ships.

The engineering organizations that get this right will have a compounding advantage. Their teams will adopt AI tools faster, use them more effectively, and ship more valuable work. The ones that don't will be flying blind — hoping their AI investment is paying off but never really knowing.

Hope isn't a metrics strategy.


GitVelocity measures engineering velocity by scoring every merged PR using AI. Track AI adoption naturally through output data, not tool surveillance.

See how it works.

Conrad Chu
Written by Conrad Chu

Conrad is CTO and Partner at Headline, where he leads data-driven investment across early stage and growth funds with over $4B in AUM. Before becoming an investor, he founded Munchery (raised $130M+) and held engineering and product leadership roles at IAC and Convio (IPO 2010). He and the Headline engineering team built GitVelocity to help engineering organizations roll out agentic coding and measure its impact.