Sitemap

A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.

Posts

DeepSeek Group Relative Policy Optimization (GRPO) and its findings/thoughts on RL

Published: January 25, 2025

DeepSeek-R1 successfully utilizes Group Relative Policy Optimization (GRPO) to improve LLM reasoning capability. This article dives deeper into GRPO, studying its motivation and mechanisms. Also, this article summarizes interesting DeepSeek findings on utilizing RL on LLMs.

How does DeepSeek-r1 obtain its superior reasoning capability via RL

Published: January 23, 2025

DeepSeek-r1 model series have obtained strong reasoning capability. This blog studies the core of its reinforcement learning based algorithm.

A Brief Summary on Agent Tuning

Published: January 22, 2025

This is a brief introduction to agent tuning, including its motivation, challenges, common practice and existing datasets.

Shuo Li

Sitemap

Pages

Page Not Found

About Me

Archive Layout with Content

Posts by Category

Collaborations

Posts by Collection

CV

Markdown

Page not in menu

Page Archive

Portfolio

Projects

Publications

Sitemap

Posts by Tags

Talk map

Talks and presentations

Talks

Teaching

Terms and Privacy Policy

Blog posts

Jupyter notebook markdown generator

Posts

DeepSeek Group Relative Policy Optimization (GRPO) and its findings/thoughts on RL

How does DeepSeek-r1 obtain its superior reasoning capability via RL

A Brief Summary on Agent Tuning