Evaluating AI Fluency in Your Engineering Hiring Process

Last quarter, we made two engineering hires within three weeks of each other. Similar roles. Similar compensation. Very different results.

The first candidate crushed the whiteboard round. Clean binary search implementation. Solid system design. Asked smart clarifying questions. We hired them feeling confident.

The second candidate was shakier on the algorithmic portion but had built an impressive side project using Cursor and Claude Code. During the walkthrough, they could explain every architectural decision, why they'd overridden the AI's suggestions in certain places, and where they'd let the AI handle the boilerplate. They were transparent about what was AI-generated and what wasn't.

Six weeks into the job, the second hire was shipping PRs with significantly higher complexity scores than the first. Not because they were smarter. Because they knew how to get leverage from their tools.

That experience -- and repeating it twice more -- forced us to redesign our interview process from scratch.

The Skills We Used to Test Are Depreciating

I'm not going to argue that computer science fundamentals don't matter. They do. You need to understand hash maps to know when the AI suggests an O(n^2) solution where an O(n) one exists. You need to understand concurrency to catch race conditions in AI-generated async code.

But the way we test for these skills is wrong. Asking someone to implement quicksort on a whiteboard tells you they can implement quicksort on a whiteboard. It doesn't tell you whether they can build a feature in 2026.

The skills that actually predict performance now are different. And most interview processes don't test for them at all.

Three Skills That Matter in 2026

After a year of correlating interview performance with on-the-job velocity scores, I've identified three skills that actually predict how well an engineer will ship.

Problem decomposition for AI. Great AI-augmented engineers don't throw entire problems at an LLM and hope for the best. They break work into chunks that AI handles well -- discrete, well-scoped tasks with clear inputs and outputs. They understand the granularity at which AI is reliable and where it starts hallucinating.

This is a skill. It's not obvious. Engineers who haven't developed it either over-rely on AI (giving it vague, massive prompts) or under-use it (writing everything manually because the AI "doesn't understand" the full picture).

Output evaluation. AI generates plausible code. Sometimes that code is subtly wrong -- correct syntax, reasonable structure, but wrong business logic or poor edge case handling. The ability to read AI output with the same critical eye you'd apply to a junior developer's PR is the difference between shipping fast and shipping bugs fast.

Knowing when to abandon the AI approach. This is the one nobody talks about. Sometimes the AI can't help. Maybe the problem is too novel. Maybe the codebase context is too deep. Maybe the AI keeps circling the same wrong solution. The best engineers recognize this quickly, switch to manual mode, and don't waste time wrestling with a tool that's not working for this particular problem.

How We Actually Run the Interview

I'll share our process. Steal it, modify it, whatever works. The point isn't that our specific format is magical. The point is that it tests the right things.

The Open-Toolbox Project

We send candidates a realistic problem. Not a LeetCode puzzle -- a simplified version of something we've actually built. "Build an API endpoint that ingests webhook events, deduplicates them, and stores the results." Time limit: a few hours. They can use anything they want. AI tools, documentation, their own code, whatever.

But we also ask them to keep rough notes on their process. Which parts did AI generate? Where did they modify the AI's output and why? Which parts did they write from scratch? This isn't a test of honesty -- it's a test of metacognition. Engineers who can articulate their process have thought about it deliberately.

The follow-up conversation matters more than the code. We spend thirty minutes drilling into their decisions. "Why this database schema?" "What would break if this endpoint got 100x the traffic?" "Walk me through the error handling -- what happens if the webhook provider sends malformed JSON?"

If they can explain every line and defend every decision, the AI assistance made them faster without making them shallow. That's the goal.

Unassisted Problem Solving

We give them a system design problem for forty minutes with no AI. This isn't about punishing them for liking AI. It's about verifying they can think independently.

AI tools go down. They hit problems the model hasn't seen. They encounter situations where the AI is confidently wrong and only independent reasoning can catch it. I need to know they have a foundation.

I'm watching for structured thinking, not correct answers. Do they identify constraints? Do they reason about trade-offs? Can they sketch an approach and explain its weaknesses?

Watching Them Work With AI

Then we flip it. Twenty minutes, a smaller problem, and we explicitly ask them to use their preferred AI tools while we observe.

This is the most informative twenty minutes of the entire process. You learn:

How they prompt. "Make this work" vs. "Implement a rate limiter using a sliding window algorithm, max 100 requests per minute per API key, returning a 429 with retry-after header when exceeded." The specificity of their prompts reveals their decomposition skill.

How they react to bad output. Do they accept without reading? Do they spot the issue immediately? Do they refine the prompt or just edit the code directly? Their response time tells you how experienced they are with AI workflows.

Whether they have a rhythm. Experienced AI users develop a natural cadence -- generate, scan, accept or reject, move forward. Less experienced ones hesitate. They're unsure whether to trust the output, unsure when to intervene, unsure how to iterate.

The Process Conversation

We end with ten minutes of just talking. "How do you approach a new feature from zero?" "When do you reach for AI vs. coding manually?" "Tell me about a time the AI led you astray and how you recovered."

Engineers who've thought deliberately about how AI fits into their workflow have specific answers. "I use Claude for scaffolding new modules but always rewrite the error handling myself because it consistently gets our retry logic wrong." That's intentionality. That's someone who's iterated on their process, not just adopted a tool.

Validating the Hire With Data

The interview gets you in the door. The data tells you whether the door was the right one.

Once someone joins, their merged PRs get scored automatically. Within four to six weeks, you can see their velocity trajectory. Are they ramping? Are they shipping work at the complexity level the interview predicted? Are they adopting AI tools effectively in your actual codebase, not just in a controlled interview setting?

At Headline, this closed the loop on hiring. We stopped being surprised by six-month performance reviews because the objective output data told us at the six-week mark whether someone was tracking well or needed a course correction.

Build the Interview for the Job That Exists

The engineering job changed. The tools changed. The leverage points changed. If your interview is still testing whether someone can write a balanced binary tree from memory, you're screening for a job description from 2019.

Test decomposition, evaluation, and iteration. Watch them work with AI. Verify they can think without it. Then measure what they actually ship once they start.

The engineers who'll drive the most impact aren't the ones with the longest resumes or the highest LeetCode scores. They're the ones who've figured out how to compound their intelligence with AI and ship complex work consistently. Your interview process should find those people.

GitVelocity measures engineering velocity by scoring every merged PR using AI. Know whether your new hires are performing within weeks, not months.

See how it works.