⚠ WebGPU is not available in this browser.
WebLLM needs WebGPU to run LLMs locally.
Use Chrome 113+ or Edge 113+ to try this demo.
The How It Works tab below explains the full API.
engine.chat.completions.create() is identical to the OpenAI SDK| Model | Best for | Size | VRAM |
|---|---|---|---|
| Llama 3.2 1B | First test, fast | ~700 MB | 2 GB |
| Gemma 2 2B | Balanced quality | ~1.5 GB | 3 GB |
| Phi-3.5 Mini | Strong reasoning | ~2.2 GB | 4 GB |
| Llama 3.2 3B | Higher quality | ~2.0 GB | 4 GB |