SqueakyPickle

The more I work in Reinforcement Learning, better is the realization that coding is one domain where RL-like training scales extremely well.

Compared to tasks like creativity or conversation, coding has clear reward signals, which RL methods can exploit. Code correctness can be measured - tests pass or fail, program compiles or errors. Models can write code → run it → get reward → improve…

Hence expect dramatic improvement in SWE-bench, HumanEval, Codeforces, etc. in 2026…

Btw have you tested Opus 4.5 yet? I am truly impressed by its coding ability

3mo ago