0. 示例博客
Published:
这是一个示例博客文章。
Published:
这是一个示例博客文章。
Published:
Some of Other Thoughts
Published:
About Position Embedding: Sinusoidal PE & RoPE
Published:
Sinusoidal PE vs. RoPE
Published:
Introduction of Proximal Policy Optimization (PPO)
Published:
Introduction of Group Relative Policy Optimization (GRPO)
Published:
Introduction of KL Divergence, entropy, cross entropy, and the difference between Forward KL and Reverse KL.
Published:
Derivation of common loss functions from a unified MLE / NLL perspective: MSE, MAE, CE, and beyond.
Published:
Deriving MSE, MAE, KL, and InfoNCE losses from the Information Bottleneck framework, unified via mutual information.
Published:
On-policy distillation (OPD): combining SFT’s dense supervision with RL’s on-policy property, plus a tour of self-distillation works (OPSD, SDFT, SDPO, CRISP, ExOPD, GAD).
Published:
A visual companion to blog 8. Image panels collected from an external author’s note (watermarks preserved).