Sitemap
A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.
Pages
Posts
DeepSeek Group Relative Policy Optimization (GRPO) and its findings/thoughts on RL
Published:
DeepSeek-R1 successfully utilizes Group Relative Policy Optimization (GRPO) to improve LLM reasoning capability. This article dives deeper into GRPO, studying its motivation and mechanisms. Also, this article summarizes interesting DeepSeek findings on utilizing RL on LLMs.
How does DeepSeek-r1 obtain its superior reasoning capability via RL
Published:
DeepSeek-r1 model series have obtained strong reasoning capability. This blog studies the core of its reinforcement learning based algorithm.
A Brief Summary on Agent Tuning
Published:
This is a brief introduction to agent tuning, including its motivation, challenges, common practice and existing datasets.