SqueakyPickle
SqueakyPickle

The more I work in Reinforcement Learning, better is the realization that coding is one domain where RL-like training scales extremely well.

Compared to tasks like creativity or conversation, coding has clear reward signals, which RL methods can exploit. Code correctness can be measured - tests pass or fail, program compiles or errors. Models can write code → run it → get reward → improve…

Hence expect dramatic improvement in SWE-bench, HumanEval, Codeforces, etc. in 2026…

Btw have you tested Opus 4.5 yet? I am truly impressed by its coding ability

5d ago
Jobs
One interview, 1000+ job opportunities
Take a 10-min AI interview to qualify for numerous real jobs auto-matched to your profile 🔑
+322 new users this month
No comments yet

You're early. There are no comments yet.

Be the first to comment.

Discover more
Curated from across