Mistral AI | Gen.QA

Code Generation QA

Devstral 2: SWE-bench Verified Plus Cline Human Evals

How Mistral evaluated Devstral 2 and Devstral Small 2: 72.2% and 68.0% on SWE-bench Verified, plus Cline-scaffolded human win-rate comparisons.

December 9, 2025 4 min read

Code Generation QA

Devstral on SWE-bench: Open Code Agents That Actually Pass

How Mistral AI's Devstral Small 1.1 (53.6%) and Devstral Medium (61.6%) score on SWE-bench Verified, and what their no-test-time-scaling claim means for QA.

July 10, 2025 4 min read

Code Generation QA

Devstral: Mistral’s Agentic Coder on SWE-bench Verified

Devstral, Mistral AI and All Hands AI's 24B agentic coder, scored 46.8% on SWE-bench Verified under OpenHands, beating models many times its size.

May 21, 2025 4 min read

Code Generation QA

Codestral 25.01: Inside Mistral’s Code Benchmarks

How Mistral's Codestral 25.01 scores code generation: 86.6% HumanEval, 95.3% fill-in-the-middle pass@1, and a #1 Copilot Arena debut, explained.

January 13, 2025 4 min read

Code Generation QA

Codestral: How Mistral Evaluates Its First Code Model

How Mistral evaluates Codestral, its first 22B code model, across HumanEval, MBPP, CruxEval, RepoBench, Spider, and fill-in-the-middle benchmarks.

May 29, 2024 4 min read