Microsoft Study Reveals AI Still Struggles to Debug Code Effectively

AI models from top companies like OpenAI, Anthropic, and others are now helping with software development tasks. As per Google CEO Sundar Pichai, “25% of new code” at Google is being generated by AI. Meta’s CEO Mark Zuckerberg also wants to “widely deploy AI coding models” within the company.

This shows how fast AI tools are being adopted in the programming world. But when it comes to debugging, even advanced models are not performing as well as expected.

Microsoft Study Highlights Weaknesses

A new study by Microsoft Research tested nine AI models including “Claude 3.7 Sonnet”, “o1”, and “o3-mini” using a benchmark called SWE-bench Lite. These models were asked to solve 300 real debugging tasks, with access to tools like a Python debugger.

The best performer was “Claude 3.7 Sonnet” with a 48.4% success rate. OpenAI’s o1 followed with 30.2%, and o3-mini only solved 22.1% of the tasks.

Researchers said the main reasons behind the low success were poor use of debugging tools and lack of training data that shows how real developers fix problems. They wrote,

“We strongly believe that training or fine-tuning [models] can make them better interactive debuggers.”

But they also added that this would need “trajectory data that records agents interacting with a debugger to collect necessary information before suggesting a bug fix.”

Why Human Developers Still Matter

Although AI models are improving, they still fail to match experienced human programmers especially in debugging. One evaluation found that Devin, a popular AI coding model, could only complete 3 out of 20 programming tests. The Microsoft study is a clear reminder that AI still has major limitations. It may not stop companies from investing in AI tools, but it does raise important concerns. As the report says, some models struggle to

“understand how different tools might help with different issues.”

Meanwhile, big names in tech are confident that coding careers are safe. Microsoft co-founder Bill Gates said programming will stay important. Replit CEO Amjad Masad, Okta CEO Todd McKinnon and IBM CEO Arvind Krishna have all echoed the same belief.

Financial Calculators

AI Fails at Fixing Code? Microsoft’s Latest Study Says Yes

AI Fails at Fixing Code? Microsoft’s Latest Study Says Yes

Microsoft Study Highlights Weaknesses

Why Human Developers Still Matter

Community Discussion

Comments

Latest

Browse

About TECHi

AI Capex Is Now 75% of U.S. GDP Growth: Why Big Tech's Spending Boom Cannot Be Stopped

Amazon Opens Its Logistics Empire to Every Business: Why FedEx, UPS, and GXO Just Collapsed

Palantir Smashes Q1 2026 Earnings: Revenue Hits $1.63B, US Commercial Surges 130% — So Why Is PLTR Still Range-Bound?

Tencent’s Hy3 preview Marks the First Major Model to Emerge From Its AI Rebuild

Tesla's Affordable Model Problem: What Wall Street Actually Wants Tonight