Beyond Sparse Rewards: Enhancing Reinforcement Learning with Language Model Critique in Text GenerationJan 1, 2024ยทMeng Cao,Lei Shu,Lei Yu,Yun Zhu,Nevan Wichers,Yinxiao Liu,Lei Mengยท 0 min read CiteTypeJournal articlePublicationarXiv preprint arXiv:2401.07382Last updated on Jan 1, 2024 ← The Reasoning-Memorization Interplay in Language Models Is Mediated by a Single Direction Jan 1, 2025Drlc: Reinforcement learning with dense rewards from llm critic Jan 1, 2024 →