Blog posts

2025

DeepSeek Group Relative Policy Optimization (GRPO) and its findings/thoughts on RL

Published: January 25, 2025

DeepSeek-R1 successfully utilizes Group Relative Policy Optimization (GRPO) to improve LLM reasoning capability. This article dives deeper into GRPO, studying its motivation and mechanisms. Also, this article summarizes interesting DeepSeek findings on utilizing RL on LLMs.

How does DeepSeek-r1 obtain its superior reasoning capability via RL

Published: January 23, 2025

DeepSeek-r1 model series have obtained strong reasoning capability. This blog studies the core of its reinforcement learning based algorithm.

A Brief Summary on Agent Tuning

Published: January 22, 2025

This is a brief introduction to agent tuning, including its motivation, challenges, common practice and existing datasets.

Shuo Li

Blog posts

2025

DeepSeek Group Relative Policy Optimization (GRPO) and its findings/thoughts on RL

How does DeepSeek-r1 obtain its superior reasoning capability via RL

A Brief Summary on Agent Tuning