DeepSeek

Code Generation QA

DeepSeek-Coder-V2: How HumanEval and MBPP Are Scored

DeepSeek-Coder-V2 hits 90.2% HumanEval and 76.2% MBPP via EvalPlus. A practitioner guide to what each code benchmark actually measures and where it breaks.

June 17, 2024 4 min read

Code Generation QA

DeepSeek-Coder Explained: HumanEval, MBPP and Code Evals

How DeepSeek-Coder was trained and benchmarked on HumanEval, MBPP, DS-1000 and LeetCode, and what its 50.3% pass@1 means for AI code-generation QA.

January 25, 2024 4 min read