AI models from top companies like OpenAI, Anthropic, and others are now helping with software development tasks. As per Google CEO Sundar Pichai, “25% of new code” at Google is being generated by AI. Meta’s CEO Mark Zuckerberg also wants to “widely deploy AI coding models” within the company.

This shows how fast AI tools are being adopted in the programming world. But when it comes to debugging, even advanced models are not performing as well as expected.

Microsoft Study Highlights Weaknesses

A new study by Microsoft Research tested nine AI models including “Claude 3.7 Sonnet”, “o1”, and “o3-mini” using a benchmark called SWE-bench Lite. These models were asked to solve 300 real debugging tasks, with access to tools like a Python debugger.

The best performer was “Claude 3.7 Sonnet” with a 48.4% success rate. OpenAI’s o1 followed with 30.2%, and o3-mini only solved 22.1% of the tasks.

Researchers said the main reasons behind the low success were poor use of debugging tools and lack of training data that shows how real developers fix problems. They wrote,

“We strongly believe that training or fine-tuning [models] can make them better interactive debuggers.”

But they also added that this would need “trajectory data that records agents interacting with a debugger to collect necessary information before suggesting a bug fix.”

Why Human Developers Still Matter

Although AI models are improving, they still fail to match experienced human programmers especially in debugging. One evaluation found that Devin, a popular AI coding model, could only complete 3 out of 20 programming tests. The Microsoft study is a clear reminder that AI still has major limitations. It may not stop companies from investing in AI tools, but it does raise important concerns. As the report says, some models struggle to

“understand how different tools might help with different issues.”

Meanwhile, big names in tech are confident that coding careers are safe. Microsoft co-founder Bill Gates said programming will stay important. Replit CEO Amjad Masad, Okta CEO Todd McKinnon and IBM CEO Arvind Krishna have all echoed the same belief.