CosmicCoconut

Running LLMs in browser

I’m developing a frontend portfolio website with an AI chatbot that can answer questions related to me.

The main challenge I’m facing is that I don’t want to invest in hosting a backend server that makes calls to open-source LLMs.

My plan is to use libraries like webllm or reactllm to run small LLMs directly in the browser. The process involves downloading a significant amount of data (around 300 MB) into the browser, and then the webGPU in modern browsers handles the LLM. I successfully implemented this approach using SmolLM2-360M and the webllm library, but the model is too small, and the chatbot’s responses are not relevant.

I need assistance in either preventing hallucinations by incorporating RAG techniques or exploring alternative ideas, such as cost-effective backend hosting options or alternative methods for running LLMs in the browser.

11d ago