Code Generation QA
WizardCoder: How Microsoft Evolved Code LLM Benchmarks
How Microsoft's WizardCoder uses Evol-Instruct to hit 57.3 pass@1 on HumanEval, beating Claude and Bard on code benchmarks at just 15B parameters.
How Microsoft's WizardCoder uses Evol-Instruct to hit 57.3 pass@1 on HumanEval, beating Claude and Bard on code benchmarks at just 15B parameters.
A practitioner's guide to CodeXGLUE, Microsoft Research's 14-dataset, 10-task benchmark for code understanding and generation, plus what its metrics miss.
CodeBLEU is Microsoft Research's metric for grading generated code with AST and data-flow matching, correlating with human judgment better than BLEU.