
Running LLMs in browser
I’m developing a frontend portfolio website with an AI chatbot that can answer questions related to me.
The main challenge I’m facing is that I don’t want to invest in hosting a backend server that makes calls to open-source LLMs.
My plan is to use libraries like webllm or reactllm to run small LLMs directly in the browser. The process involves downloading a significant amount of data (around 300 MB) into the browser, and then the webGPU in modern browsers handles the LLM. I successfully implemented this approach using SmolLM2-360M and the webllm library, but the model is too small, and the chatbot’s responses are not relevant.
I need assistance in either preventing hallucinations by incorporating RAG techniques or exploring alternative ideas, such as cost-effective backend hosting options or alternative methods for running LLMs in the browser.
Talking product sense with Ridhi
9 min AI interview5 questions