Why surveys fall short in measuring the impact of AI coding

Insights

Why surveys fall short in measuring the impact of AI coding

Stephen Poletto, Field CTO

•

Nov 13, 2025

Every week, I talk with engineering leaders who are trying to make sense of what AI is really doing inside their organizations. They want to know if the investment in AI coding tools is paying off, if developers are actually faster, and if the quality of code is improving.

It sounds like an easy question. Yet when we dig in, most teams admit they don’t really know. They have surveys, anecdotes, and dashboards that show adoption rates, but very little evidence of what has changed in the code itself.

Surveys have become the default way to fill that gap. They’re simple to run and easy to interpret. Ask developers how they feel about AI tools, ask leaders how much faster they think their teams have become, and you end up with charts that look convincing. But those charts often miss the truth.

What the Data Really Shows

A study from METR (Model Evaluation and Transparency Research) in July 2025 highlighted this problem clearly. Researchers asked experienced open-source developers to complete real GitHub issues, some with access to AI coding tools and some without.

The outcome was surprising. Developers who used AI tools took 19% longer to complete the same work. Yet even after experiencing the slowdown firsthand, they still believed AI had sped them up by 20%.

That belief persisted despite the data showing the opposite. Developers genuinely felt faster, even though they were slower. It is a powerful example of how perception-based measures like surveys can be misleading when assessing the real impact of AI on productivity.

The Science Backs It Up

This gap between how productive people feel and how productive they are has been observed before in software engineering research.

In a Stanford study led by Yegor Denisov-Blanch, researchers asked 43 professional engineers to rate their own productivity in percentile buckets, such as “I’m around the 70th percentile.” They then compared those self-ratings to measured productivity based on objective performance data.

The correlation between the two was almost nonexistent: R² = 0.03. In simple terms, self-assessments explained only about 3% of the variance in real productivity. Developers were often off by 30 percentage points or more when estimating their own performance.

Denisov-Blanch’s conclusion was clear: self-assessment surveys are an inaccurate way to measure developer productivity. If people can’t reliably gauge their baseline output, it’s nearly impossible to use surveys alone to measure something as complex and variable as the impact of AI-assisted coding.

Why Surveys Alone Can’t Tell the Full Story

I see this play out constantly with customers. Teams run surveys that ask questions like, “How much has AI improved your productivity?” or “How often do you use AI for coding?” The responses are interesting, but they rarely align with the data we later observe in the code itself.

There are a few reasons for that:

1. Definitions are inconsistent.

One developer’s “using AI” might mean accepting a single suggestion a day, while another might be shipping entire functions written by an assistant. Without clear definitions, survey results blur those distinctions together.

2. Perception doesn’t equal performance.

As both the METR and Stanford studies show, humans are poor judges of their own productivity. We tend to over-credit tools that feel helpful, even when they slow us down in practice.

3. Context matters.

AI’s impact varies across teams, projects, and environments. A small startup building new features might see big gains, while a large enterprise maintaining legacy systems might experience slowdowns. Surveys often miss that nuance.

Where Surveys Still Help

Despite their limits, surveys still play an important role. When used carefully, they capture the emotional and experiential side of development. They show how developers feel about their tools, where friction lives in collaboration, and what leaders should pay attention to next.

At Span, we see surveys as a valuable way to understand the developer experience. They point to where teams are thriving and where they’re struggling. But when it comes to measuring the impact of AI on actual delivery, they aren’t enough.

Measuring What Actually Hits Production

The limitation with existing approaches is why we built span-detect-1, our proprietary model that measures AI adoption directly in the code. It looks at code shipped to production and identifies whether the code was authored by humans or AI, with over 95% accuracy across all AI tools.

This gives teams a verifiable view of AI’s real footprint. You can see what percentage of code that’s shipping is AI-generated, how that varies by team or project, and how those patterns relate to key metrics such as cycle time, rework, and defect rates.

Once you have that view, the right questions become possible:

Where does AI truly accelerate delivery, and where does it slow us down?
How much rework or review time is tied to AI-authored code?
Are we applying AI to the right kinds of work, or introducing hidden inefficiencies?

When those insights are paired with well-crafted surveys, you get the complete picture: what people feel, and what is actually true.

From Feelings to Facts

AI has already reshaped how software is written, but the hardest part is not generating code faster. It is understanding what that change means. Surveys reveal perception. Data reveals performance.

As a Field CTO, I see this gap every day. Teams do not just need dashboards. They need clarity they can trust. That means grounding every conversation about AI impact in verifiable evidence: the code that ships, the quality that follows, and the outcomes that persist.

The organizations that get this right do not choose between surveys and data. They combine them. They listen to their developers and validate what they hear against the ground truth. That is how they separate excitement from effectiveness and uncover where AI is truly helping them build better software.

Everything you need to unlock engineering excellence

GET A DEMO