
Lépjen offline állapotba az Player FM alkalmazással!
Arash Ahmadian on Rethinking RLHF
Manage episode 408698610 series 2536330
Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.
Featured Reference
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker
Additional References
- Self-Rewarding Language Models, Yuan et al 2024
- Reinforcement Learning: An Introduction, Sutton and Barto 1992
- Learning from Delayed Rewards, Chris Watkins 1989
- Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams 1992
73 epizódok
Manage episode 408698610 series 2536330
Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.
Featured Reference
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker
Additional References
- Self-Rewarding Language Models, Yuan et al 2024
- Reinforcement Learning: An Introduction, Sutton and Barto 1992
- Learning from Delayed Rewards, Chris Watkins 1989
- Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams 1992
73 epizódok
Semua episod
×Üdvözlünk a Player FM-nél!
A Player FM lejátszó az internetet böngészi a kiváló minőségű podcastok után, hogy ön élvezhesse azokat. Ez a legjobb podcast-alkalmazás, Androidon, iPhone-on és a weben is működik. Jelentkezzen be az feliratkozások szinkronizálásához az eszközök között.