Bio
Papers
News
Experience

Recent & Upcoming Talks
- Example Talk
Publications
Projects
Blog
Projects
Teaching
- Learn JavaScript
- Learn Python
Work Experience

Beyond Sparse Rewards: Enhancing Reinforcement Learning with Language Model Critique in Text Generation

Jan 1, 2024·

Meng Cao

,

Lei Shu

,

Lei Yu

,

Yun Zhu

,

Nevan Wichers

,

Yinxiao Liu

,

Lei Meng

· 0 min read

Type

Journal article

Publication

arXiv preprint arXiv:2401.07382

Last updated on Jan 1, 2024

← The Reasoning-Memorization Interplay in Language Models Is Mediated by a Single Direction Jan 1, 2025

Drlc: Reinforcement learning with dense rewards from llm critic Jan 1, 2024 →

© 2026 Me. This work is licensed under CC BY NC ND 4.0

Published with Hugo Blox Builder — the free, open source website builder that empowers creators.