Microsoft | Gen.QA

Code Generation QA

WizardCoder: How Microsoft Evolved Code LLM Benchmarks

How Microsoft's WizardCoder uses Evol-Instruct to hit 57.3 pass@1 on HumanEval, beating Claude and Bard on code benchmarks at just 15B parameters.

June 14, 2023 4 min read

Code Generation QA

CodeXGLUE Explained: Microsoft’s Code Generation Benchmark

A practitioner's guide to CodeXGLUE, Microsoft Research's 14-dataset, 10-task benchmark for code understanding and generation, plus what its metrics miss.

February 9, 2021 4 min read

Code Generation QA

How CodeBLEU Scores AI-Generated Code (Microsoft Research)

CodeBLEU is Microsoft Research's metric for grading generated code with AST and data-flow matching, correlating with human judgment better than BLEU.

September 22, 2020 4 min read