
JumpyTaco
Paper with Code: You can now run LLMs without Matrix Multiplications
Saw this paper: https://arxiv.org/pdf/2406.02528
In essence, MatMul operations can be completely eliminated from LLMs while maintaining strong performance at billion-parameter scales and by utilising an optimised kernel during inference, their model’s memory consumption can be reduced by more than 10× compared to un-optimised models.
source: https://x.com/rohanpaul_ai/status/1799122826114330866
13mo ago
Jobs
One interview, 1000+ job opportunities
Take a 10-min AI interview to qualify for numerous real jobs auto-matched to your profile 🔑+322 new users this month

You're early. There are no comments yet.
Be the first to comment.
Discover more
Curated from across