Blogs

0. 示例博客

Published: February 02, 2026

这是一个示例博客文章。

0. Other Thoughts

Published: February 05, 2026

Some of Other Thoughts

1. Position Embedding

Published: February 03, 2026

About Position Embedding: Sinusoidal PE & RoPE

2. Thought of Position Embedding

Published: February 04, 2026

Sinusoidal PE vs. RoPE

3. Proximal Policy Optimization (PPO)

Published: February 11, 2026

Introduction of Proximal Policy Optimization (PPO)

4. Group Relative Policy Optimization (GRPO)

Published: February 11, 2026

Introduction of Group Relative Policy Optimization (GRPO)

5. KL Divergence：从熵到 Forward/Reverse KL

Published: April 25, 2026

Introduction of KL Divergence, entropy, cross entropy, and the difference between Forward KL and Reverse KL.

6. 常见损失函数的推导：从 MLE 到 MSE / MAE / CE

Published: April 28, 2026

Derivation of common loss functions from a unified MLE / NLL perspective: MSE, MAE, CE, and beyond.

7. 信息瓶颈视角：从互信息推导 MSE / MAE / KL / InfoNCE

Published: April 28, 2026

Deriving MSE, MAE, KL, and InfoNCE losses from the Information Bottleneck framework, unified via mutual information.

8. On-Policy Distillation：从 SFT/RL 到 Self-Distillation

Published: April 28, 2026

On-policy distillation (OPD): combining SFT’s dense supervision with RL’s on-policy property, plus a tour of self-distillation works (OPSD, SDFT, SDPO, CRISP, ExOPD, GAD).

9. On-Policy Distillation：图解速览（外部资料汇编）

Published: April 28, 2026

A visual companion to blog 8. Image panels collected from an external author’s note (watermarks preserved).