
CosmicDumpling
16moby
Student
What do people actually do being a RLHF Specialist ? And how ?
16mo ago

FloatingMuffin
16mo
You mean Reinforcement learning from human feedback? I use that at work (R&D) as we often have a RL agent like PPO combined with LLM for shaping rewards
Discover more
Curated from across