CosmicDumpling
CosmicDumpling
14mo
by
Student

What do people actually do being a RLHF Specialist ? And how ?

14mo ago
FloatingMuffin
FloatingMuffin
14mo

You mean Reinforcement learning from human feedback? I use that at work (R&D) as we often have a RL agent like PPO combined with LLM for shaping rewards

Discover more
Curated from across