Drlc: Reinforcement learning with dense rewards from llm criticJan 1, 2024ยทMeng Cao,Lei Shu,Lei Yu,Yun Zhu,Nevan Wichers,Yinxiao Liu,Lei Mengยท 0 min read CiteTypeJournal articlePublicationarXiv e-printsLast updated on Jan 1, 2024 ← Beyond Sparse Rewards: Enhancing Reinforcement Learning with Language Model Critique in Text Generation Jan 1, 2024Emergence of a high-dimensional abstraction phase in language transformers Jan 1, 2024 →