Code Generation QA
DeepSeek-Coder-V2: How HumanEval and MBPP Are Scored
DeepSeek-Coder-V2 hits 90.2% HumanEval and 76.2% MBPP via EvalPlus. A practitioner guide to what each code benchmark actually measures and where it breaks.
