How to Call the OpenAI API with Vanilla JavaScript (No Backend Needed)

The single most common question from frontend developers discovering AI: “Do I need a backend to use the OpenAI API?”

The short answer is no — not for learning, prototyping, or personal projects. You can call the OpenAI API directly from plain JavaScript using the fetch API you already know. No Node.js, no Express, no frameworks.

This tutorial covers everything from your first API call to production-ready patterns with streaming, error handling, and rate limit management.

Security note first: For public-facing production apps, you should never expose your API key in client-side code. We cover the correct approach at the end of this article. For personal projects, local development, and learning — the direct approach shown here is perfectly fine.

Prerequisites

Before writing any code, you need:

An OpenAI account and API key (starts with sk-)
A browser (any modern browser works)
A basic HTML file — that is genuinely all

No npm install. No build step. No configuration files.

Your First API Call

The OpenAI chat completions endpoint accepts a POST request with a JSON body. Here is the minimal working version:

const response = await fetch('https://api.openai.com/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': `Bearer YOUR_API_KEY_HERE`
  },
  body: JSON.stringify({
    model: 'gpt-4o-mini',       // cheapest capable model
    max_tokens: 500,
    messages: [
      {
        role: 'user',
        content: 'Explain CSS flexbox in two sentences.'
      }
    ]
  })
});

const data = await response.json();
console.log(data.choices[0].message.content);

That is the entire API call. Paste it into your browser’s DevTools console (with your real API key), run it, and you will see a GPT response within a couple of seconds.

Understanding the Request Structure

The `messages` array

The messages array is the core of the API. Each message has a role and content:

messages: [
  // System message — sets the AI's behaviour and persona
  {
    role: 'system',
    content: 'You are a concise frontend development tutor. Answer in plain English without jargon.'
  },
  // Previous conversation turns (for multi-turn chat)
  {
    role: 'user',
    content: 'What is the box model?'
  },
  {
    role: 'assistant',
    content: 'The box model describes how every HTML element is a rectangular box...'
  },
  // The new user message
  {
    role: 'user',
    content: 'Now explain margin collapsing.'
  }
]

Three roles:

system — instructions for the AI (optional but powerful)
user — what the human said
assistant — what the AI previously replied (used to maintain conversation history)

Choosing a model

Model	Speed	Cost	Best for
`gpt-4o-mini`	Fast	Very cheap	Most tasks, learning, prototyping
`gpt-4o`	Medium	Moderate	Complex reasoning, code generation
`gpt-4-turbo`	Slower	Expensive	Maximum capability tasks

For this tutorial, gpt-4o-mini is the right choice — it is fast, cheap, and more than capable for typical frontend use cases.

A Complete, Working HTML Page

Here is a full, self-contained page you can save as index.html and open directly in a browser:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>OpenAI API Demo</title>
  <style>
    body {
      font-family: system-ui, sans-serif;
      max-width: 640px;
      margin: 40px auto;
      padding: 0 20px;
      background: #0d1117;
      color: #c4d4ed;
    }
    textarea, button {
      width: 100%;
      font-family: inherit;
      font-size: 14px;
    }
    textarea {
      background: #161c2d;
      border: 1px solid rgba(255,255,255,.1);
      border-radius: 8px;
      padding: 12px;
      color: #f0f6ff;
      resize: vertical;
      min-height: 80px;
      outline: none;
      margin-bottom: 10px;
    }
    button {
      background: linear-gradient(135deg, #5b9cf6, #06d6b0);
      border: none;
      border-radius: 8px;
      padding: 12px;
      color: #fff;
      font-weight: 700;
      cursor: pointer;
    }
    button:disabled { opacity: .5; cursor: not-allowed }
    #output {
      margin-top: 20px;
      background: #161c2d;
      border: 1px solid rgba(255,255,255,.08);
      border-radius: 8px;
      padding: 16px;
      line-height: 1.75;
      min-height: 60px;
      white-space: pre-wrap;
    }
    .error { color: #f87171 }
  </style>
</head>
<body>
  <h2 style="color:#f0f6ff;margin-bottom:20px">OpenAI API — Vanilla JS</h2>

  <textarea id="prompt" placeholder="Ask anything…">Explain CSS Grid in 3 bullet points.</textarea>
  <button id="btn" onclick="ask()">Ask GPT</button>
  <div id="output">Response will appear here…</div>

  <script>
    const API_KEY = 'YOUR_API_KEY_HERE'; // ← replace this

    async function ask() {
      const prompt = document.getElementById('prompt').value.trim();
      const output = document.getElementById('output');
      const btn    = document.getElementById('btn');

      if (!prompt) return;

      btn.disabled     = true;
      btn.textContent  = 'Thinking…';
      output.textContent = '';
      output.className = '';

      try {
        const res = await fetch('https://api.openai.com/v1/chat/completions', {
          method: 'POST',
          headers: {
            'Content-Type':  'application/json',
            'Authorization': `Bearer ${API_KEY}`
          },
          body: JSON.stringify({
            model:      'gpt-4o-mini',
            max_tokens: 800,
            messages: [
              {
                role:    'system',
                content: 'You are a helpful frontend development tutor. Be concise and practical.'
              },
              { role: 'user', content: prompt }
            ]
          })
        });

        if (!res.ok) {
          const err = await res.json();
          throw new Error(err.error?.message ?? `HTTP ${res.status}`);
        }

        const data    = await res.json();
        output.textContent = data.choices[0].message.content;

      } catch (err) {
        output.textContent = `Error: ${err.message}`;
        output.className   = 'error';
      } finally {
        btn.disabled    = false;
        btn.textContent = 'Ask GPT';
      }
    }

    // Allow Enter to submit
    document.getElementById('prompt').addEventListener('keydown', e => {
      if (e.key === 'Enter' && e.ctrlKey) ask();
    });
  </script>
</body>
</html>

Save it, replace YOUR_API_KEY_HERE with your real key, and open it in any browser. No server needed.

Adding Streaming Responses

The basic approach above waits for the full response before showing anything. For a ChatGPT-style experience where text appears as it generates, add stream: true and read the response body as a ReadableStream:

async function askStreaming(prompt, outputEl) {
  const res = await fetch('https://api.openai.com/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Content-Type':  'application/json',
      'Authorization': `Bearer ${API_KEY}`
    },
    body: JSON.stringify({
      model:      'gpt-4o-mini',
      max_tokens: 800,
      stream:     true,           // ← enable streaming
      messages:   [{ role: 'user', content: prompt }]
    })
  });

  if (!res.ok) throw new Error(`API error ${res.status}`);

  const reader  = res.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    // Each chunk contains one or more "data: {...}" lines
    const lines = decoder.decode(value).split('\n');

    for (const line of lines) {
      if (!line.startsWith('data: ')) continue;

      const raw = line.slice(6).trim();
      if (raw === '[DONE]') return;

      try {
        const json  = JSON.parse(raw);
        const token = json.choices[0]?.delta?.content ?? '';
        if (token) {
          outputEl.textContent += token;
          // Scroll to bottom in a chat UI
          outputEl.scrollTop = outputEl.scrollHeight;
        }
      } catch {
        // Malformed chunk — skip it
      }
    }
  }
}

The key difference from the basic version:

stream: true in the request body
Reading res.body.getReader() instead of res.json()
Parsing each data: {...} line individually
Extracting delta.content instead of message.content

Maintaining Conversation History

A single API call has no memory. To build a multi-turn chat, you must send the full conversation history with every request:

const conversation = [
  {
    role:    'system',
    content: 'You are a helpful frontend development tutor.'
  }
];

async function chat(userMessage, outputEl) {
  // Add the new user message to history
  conversation.push({ role: 'user', content: userMessage });

  const res = await fetch('https://api.openai.com/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Content-Type':  'application/json',
      'Authorization': `Bearer ${API_KEY}`
    },
    body: JSON.stringify({
      model:    'gpt-4o-mini',
      messages: conversation      // ← full history every time
    })
  });

  const data  = await res.json();
  const reply = data.choices[0].message.content;

  // Add AI reply to history so next turn includes it
  conversation.push({ role: 'assistant', content: reply });

  outputEl.textContent = reply;
}

Context limit: Models have a maximum context window (how much text they can process at once). gpt-4o-mini supports 128,000 tokens — roughly 96,000 words. For most chat applications, you will hit cost limits long before the context limit. Trim old messages from the history array if conversations become very long.

Handling Errors Properly

The OpenAI API returns structured error objects you should handle gracefully:

async function callAPI(messages) {
  let res;
  try {
    res = await fetch('https://api.openai.com/v1/chat/completions', {
      method:  'POST',
      headers: {
        'Content-Type':  'application/json',
        'Authorization': `Bearer ${API_KEY}`
      },
      body: JSON.stringify({ model: 'gpt-4o-mini', messages })
    });
  } catch (networkErr) {
    // No internet, DNS failure, CORS issue
    throw new Error('Network error — check your connection.');
  }

  if (!res.ok) {
    const body = await res.json().catch(() => ({}));
    const msg  = body.error?.message ?? `HTTP ${res.status}`;

    switch (res.status) {
      case 401: throw new Error('Invalid API key. Check your credentials.');
      case 429: throw new Error('Rate limit hit. Wait a moment and try again.');
      case 500: throw new Error('OpenAI server error. Try again shortly.');
      default:  throw new Error(msg);
    }
  }

  return res.json();
}

The four errors you will encounter most:

Status	Meaning	Fix
`401`	Bad API key	Double-check the key, no extra spaces
`429`	Rate limit / quota exceeded	Wait, or upgrade your plan
`500`	OpenAI server error	Retry with exponential backoff
Network	CORS / no internet	Use a proxy for production

Rate Limiting and Retry Logic

For production-quality code, add automatic retry on 429 errors:

async function callWithRetry(messages, retries = 3) {
  for (let attempt = 0; attempt < retries; attempt++) {
    try {
      return await callAPI(messages);
    } catch (err) {
      const isRateLimit = err.message.includes('Rate limit');
      const isLast      = attempt === retries - 1;

      if (!isRateLimit || isLast) throw err;

      // Exponential backoff: 1s, 2s, 4s
      const delay = 1000 * Math.pow(2, attempt);
      console.warn(`Rate limited. Retrying in ${delay}ms…`);
      await new Promise(r => setTimeout(r, delay));
    }
  }
}

Keeping Your API Key Secure

The approach in this tutorial (key in client JavaScript) is fine for:

Learning and experimentation
Local development on your own machine
Personal projects only you access

It is not safe for public-facing sites because anyone can read your API key from the browser’s source or Network tab, then use it to run up charges on your account.

For public production apps, the correct pattern is a thin proxy:

Browser → Your Server → OpenAI API

Your server holds the key — the browser never sees it. You can implement this with:

Cloudflare Workers — free tier, deploys in seconds, perfect for W3Tweaks
Vercel Edge Functions — api/chat.js file, zero configuration
Netlify Functions — same idea, netlify/functions/chat.js

A minimal Cloudflare Worker proxy looks like this:

// Cloudflare Worker — wrangler.toml secret: OPENAI_API_KEY
export default {
  async fetch(request, env) {
    if (request.method !== 'POST') {
      return new Response('Method Not Allowed', { status: 405 });
    }

    const body = await request.json();

    const res = await fetch('https://api.openai.com/v1/chat/completions', {
      method:  'POST',
      headers: {
        'Content-Type':  'application/json',
        'Authorization': `Bearer ${env.OPENAI_API_KEY}`  // ← secret
      },
      body: JSON.stringify(body)
    });

    // Forward the response (including streaming) back to the browser
    return new Response(res.body, {
      headers: {
        'Content-Type':  res.headers.get('Content-Type'),
        'Access-Control-Allow-Origin': '*'
      }
    });
  }
};

Your frontend then calls https://your-worker.workers.dev/ instead of api.openai.com directly — and your key never leaves the server.

Useful Parameters to Know

body: JSON.stringify({
  model:       'gpt-4o-mini',
  messages:    [...],
  max_tokens:  500,         // cap response length (default: unlimited)
  temperature: 0.7,         // creativity: 0 = deterministic, 2 = very random
  top_p:       1,           // alternative to temperature — usually leave at 1
  stream:      true,        // enable token-by-token streaming
  n:           1,           // number of responses to generate
  stop:        ['\n\n'],    // stop generating at this sequence
  user:        'user-123',  // for abuse monitoring — your app's user ID
})

temperature is the one you will tune most often:

0.0 — deterministic, always the same answer. Good for code, facts.
0.7 — balanced creativity. Good for explanations and chat.
1.2 — more varied and creative. Good for writing and brainstorming.

Live Demo

Live Demo Open in tab

You need an OpenAI API key to use this demo. Your key is only used in your browser.

Key Takeaways

The OpenAI API is a standard HTTPS endpoint — any language with fetch can call it
Use gpt-4o-mini for most tasks — it is fast, cheap, and highly capable
Add stream: true and read res.body.getReader() for ChatGPT-style streaming
Send the full conversation array on every request to maintain chat history
Handle 401, 429, and 500 errors explicitly — they all need different responses
Use exponential backoff when retrying rate-limited requests
For production: proxy API calls through a Cloudflare Worker, Vercel, or Netlify function — never expose your key in client-side code