SqueakyPickle
The more I work in Reinforcement Learning, better is the realization that coding is one domain where RL-like training scales extremely well.
Compared to tasks like creativity or conversation, coding has clear reward signals, which RL methods can exploit. Code correctness can be measured - tests pass or fail, program compiles or errors. Models can write code → run it → get reward → improve…
Hence expect dramatic improvement in SWE-bench, HumanEval, Codeforces, etc. in 2026…
Btw have you tested Opus 4.5 yet? I am truly impressed by its coding ability
5d ago
Jobs
One interview, 1000+ job opportunities
Take a 10-min AI interview to qualify for numerous real jobs auto-matched to your profile 🔑+322 new users this month

You're early. There are no comments yet.
Be the first to comment.
Discover more
Curated from across