Blockswap

Blockswap

on
PerkyPotato
PerkyPotatoGrapevine19mo

Language Reasoning Models can overtake LLMs...

Here's my quick 3 minute breakdown:

  1. o1-preview: 97.8% on PlanBench Blocksworld vs. 62.5% for top LLMs, indicating shift from retrieval to reasoning.
  2. 52.8% on obfuscated "Mystery Blocksworld" vs. near-zero for LLMs, suggesting a...