
CosmicDumpling
14moby
Student
What do people actually do being a RLHF Specialist ? And how ?
14mo ago

FloatingMuffin
14mo
You mean Reinforcement learning from human feedback? I use that at work (R&D) as we often have a RL agent like PPO combined with LLM for shaping rewards
Discover more
Curated from across