Arash Ahmadian on Rethinking RLHF

TalkRL: The Reinforcement Learning Podcast

A tartalmat a Robin Ranjit Singh Chauhan biztosítja. Az összes podcast-tartalmat, beleértve az epizódokat, grafikákat és podcast-leírásokat, közvetlenül a Robin Ranjit Singh Chauhan vagy a podcast platform partnere tölti fel és biztosítja. Ha úgy gondolja, hogy valaki az Ön engedélye nélkül használja fel a szerzői joggal védett művét, kövesse az itt leírt folyamatot https://hu.player.fm/legal.

1+ y ago 33:30

MP3•Epizód kép

Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.

Featured Reference

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker

Additional References

Self-Rewarding Language Models, Yuan et al 2024
Reinforcement Learning: An Introduction, Sutton and Barto 1992
Learning from Delayed Rewards, Chris Watkins 1989
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams 1992

73 epizódok

#Reinforcement Learning #Machine Learning #Robin Ranjit Singh Chauhan #Artificial Intelligence #Tech