My former PhD student Junxiao Song pushing the limits with DeepSeek

As we all know, DeepSeek has recently pushed the limits in LLMs through groundbreaking innovations. I’m proud to share that my former PhD student Junxiao Song serves as a Principal Researcher at DeepSeek AI, where he:
-
Proposed the novel reinforcement learning algorithm GRPO (Group Relative Policy Optimization), which has been applied to train nearly all models in the DeepSeek series, e.g., DeepSeek-R1.
-
Co-developed DeepSeek-V3 (671B param MoE) and DeepSeek-V2, achieving GPT-4 level performance at 1/10 training cost.
-
Created novel reinforcement learning pipelines in DeepSeek-R1, eliminating supervised fine-tuning needs.
-
Pioneered resource-efficient training enabling 671B parameter models with $5.5M compute budget.
-
Developed model distillation techniques producing state-of-the-art 7B/70B variants.
-
Led DeepSeek-Prover-V1.5 integrating Lean 4 for theorem proving.
-
Contributed to DeepSeek-Coder-V2 surpassing closed models in code intelligence.
His work has positioned DeepSeek as the first Chinese AI company to rival OpenAI’s most advanced models while overcoming U.S. semiconductor sanctions through optimized training on restricted NVIDIA H800 GPUs.