Building Real-Time AI Chat Applications with JavaScript and Streaming APIs

In the era of Generative AI, users expect chat interfaces to feel alive. They don’t want to stare at a loading spinner for ten seconds while a Large Language Model (LLM) generates a complete paragraph. They want to see the text appear token by token, just like a human typing.

This “typing effect” isn’t just a UI trick; it’s a fundamental shift in how we handle data using Streaming APIs. In this guide, we’ll explore how to build a real-time AI chat application using JavaScript and Server-Sent Events (SSE).

The Problem: Request/Response vs. Streaming

Traditionally, web APIs work on a Request/Response model:

Client sends a prompt.
Server processes the entire prompt.
Server sends back the entire response.

With LLMs, step 2 can take a long time. If an answer is 500 words long, the user waits for the whole generation.

Streaming changes this:

Client sends a prompt.
Server starts processing.
As soon as the server generates a “chunk” (a token or word), it sends it immediately.
The client renders chunks as they arrive.

The Tech Stack

Frontend: Vanilla JavaScript (or React/Vue/Svelte)
Backend: Node.js (Express or Edge Functions)
Protocol: Server-Sent Events (SSE)
AI Provider: OpenAI API (or Anthropic/Gemini)

Step 1: The Backend (Node.js)

We need an endpoint that doesn’t close the connection immediately. We will use the OpenAI Node SDK, which supports streaming out of the box.

// server.js
import express from 'express';
import OpenAI from 'openai';
import cors from 'cors';

const app = express();
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

app.use(cors());
app.use(express.json());

app.post('/chat', async (req, res) => {
  const { message, conversationHistory = [] } = req.body;

  // Set headers for Server-Sent Events
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');

  try {
    // Build messages array with conversation history
    const messages = [
      ...conversationHistory,
      { role: 'user', content: message }
    ];

    const stream = await openai.chat.completions.create({
      model: 'gpt-4',
      messages: messages,
      stream: true, // This is crucial!
    });

    for await (const chunk of stream) {
      const content = chunk.choices[0]?.delta?.content || '';
      if (content) {
        // Format data as SSE
        res.write(`data: ${JSON.stringify({ content })}\n\n`);
      }
    }

    res.write('data: [DONE]\n\n');
    res.end();
  } catch (error) {
    console.error('Error:', error);
    res.write(`data: ${JSON.stringify({ error: error.message })}\n\n`);
    res.end();
  }
});

app.listen(3000, () => console.log('Server running on port 3000'));

Key Takeaways

Headers: Content-Type: text/event-stream tells the browser to keep the connection open.
Looping: We iterate over the stream object provided by the SDK.
Formatting: SSE requires messages to start with data: and end with \n\n.
Context: We include conversationHistory to maintain context across messages.

Step 2: The Frontend (Client-Side JavaScript)

On the client, we can’t use a standard await fetch(). We need to read the ReadableStream returned by the fetch API.

// client.js
async function sendMessage(userMessage) {
  const chatBox = document.getElementById('chat-box');
  
  // Create a placeholder for the AI response
  const aiMessageElement = document.createElement('div');
  aiMessageElement.className = 'ai-message';
  chatBox.appendChild(aiMessageElement);

  const response = await fetch('http://localhost:3000/chat', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ 
      message: userMessage,
      conversationHistory: getConversationHistory()
    }),
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split('\n\n');

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const dataStr = line.replace('data: ', '');
        
        if (dataStr === '[DONE]') return;

        try {
          const data = JSON.parse(dataStr);
          // Append the new token to the UI
          aiMessageElement.textContent += data.content;
          
          // Auto-scroll to bottom
          chatBox.scrollTop = chatBox.scrollHeight;
        } catch (e) {
          console.error('Error parsing JSON', e);
        }
      }
    }
  }
}

How it works

response.body.getReader(): This locks the stream to a reader.
reader.read(): Reads the next available chunk of binary data.
TextDecoder: Converts binary data into a string.
Parsing: We strip the data: prefix and parse the JSON to get the actual text content.

Handling Edge Cases

When building production apps, consider these challenges:

1. Markdown Rendering

Raw text looks boring. Use a library like marked to render markdown as the text streams in:

import { marked } from 'marked';

async function sendMessage(userMessage) {
  const chatBox = document.getElementById('chat-box');
  const aiMessageElement = document.createElement('div');
  aiMessageElement.className = 'ai-message';
  chatBox.appendChild(aiMessageElement);

  let accumulatedText = '';

  const response = await fetch('http://localhost:3000/chat', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ message: userMessage }),
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split('\n\n');

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const dataStr = line.replace('data: ', '');
        if (dataStr === '[DONE]') {
          // Final render with complete markdown
          aiMessageElement.innerHTML = marked.parse(accumulatedText);
          return;
        }

        try {
          const data = JSON.parse(dataStr);
          accumulatedText += data.content;
          // Render markdown incrementally
          aiMessageElement.innerHTML = marked.parse(accumulatedText);
          chatBox.scrollTop = chatBox.scrollHeight;
        } catch (e) {
          console.error('Error parsing JSON', e);
        }
      }
    }
  }
}

2. Network Interruptions & Retry Logic

async function sendMessageWithRetry(userMessage, maxRetries = 3) {
  let attempt = 0;
  
  while (attempt < maxRetries) {
    try {
      await sendMessage(userMessage);
      return; // Success
    } catch (error) {
      attempt++;
      console.error(`Attempt ${attempt} failed:`, error);
      
      if (attempt >= maxRetries) {
        showError('Failed to get response. Please try again.');
        throw error;
      }
      
      // Exponential backoff
      await new Promise(resolve => setTimeout(resolve, Math.pow(2, attempt) * 1000));
    }
  }
}

function showError(message) {
  const chatBox = document.getElementById('chat-box');
  const errorElement = document.createElement('div');
  errorElement.className = 'error-message';
  errorElement.textContent = message;
  chatBox.appendChild(errorElement);
}

3. Managing Conversation Context

class ConversationManager {
  constructor(maxTokens = 4000) {
    this.messages = [];
    this.maxTokens = maxTokens;
  }

  addMessage(role, content) {
    this.messages.push({ role, content });
    this.trimIfNeeded();
  }

  getHistory() {
    return this.messages;
  }

  trimIfNeeded() {
    // Simple token estimation (4 chars ≈ 1 token)
    const estimatedTokens = this.messages.reduce((sum, msg) => 
      sum + msg.content.length / 4, 0
    );

    if (estimatedTokens > this.maxTokens) {
      // Keep system message and remove oldest user/assistant pairs
      const systemMessages = this.messages.filter(m => m.role === 'system');
      const otherMessages = this.messages.filter(m => m.role !== 'system');
      
      // Remove oldest messages but keep recent context
      const messagesToKeep = otherMessages.slice(-10);
      this.messages = [...systemMessages, ...messagesToKeep];
    }
  }

  clear() {
    this.messages = [];
  }
}

// Usage
const conversation = new ConversationManager();

async function sendMessage(userMessage) {
  conversation.addMessage('user', userMessage);
  
  const response = await fetch('http://localhost:3000/chat', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ 
      message: userMessage,
      conversationHistory: conversation.getHistory()
    }),
  });

  // ... handle streaming
  
  // After receiving complete response
  conversation.addMessage('assistant', completeResponse);
}

UI/UX Best Practices for Chat Interfaces

Creating a great chat experience goes beyond just displaying text:

Visual Feedback

function showTypingIndicator() {
  const chatBox = document.getElementById('chat-box');
  const indicator = document.createElement('div');
  indicator.className = 'typing-indicator';
  indicator.innerHTML = `
    <span></span>
    <span></span>
    <span></span>
  `;
  indicator.id = 'typing-indicator';
  chatBox.appendChild(indicator);
}

function hideTypingIndicator() {
  document.getElementById('typing-indicator')?.remove();
}

async function sendMessage(userMessage) {
  showTypingIndicator();
  
  try {
    // ... streaming logic
  } finally {
    hideTypingIndicator();
  }
}

CSS for Typing Effect

.typing-indicator {
  display: flex;
  gap: 4px;
  padding: 12px;
  background: #f0f0f0;
  border-radius: 8px;
  width: fit-content;
}

.typing-indicator span {
  width: 8px;
  height: 8px;
  background: #999;
  border-radius: 50%;
  animation: bounce 1.4s infinite ease-in-out;
}

.typing-indicator span:nth-child(1) {
  animation-delay: -0.32s;
}

.typing-indicator span:nth-child(2) {
  animation-delay: -0.16s;
}

@keyframes bounce {
  0%, 80%, 100% { transform: scale(0); }
  40% { transform: scale(1); }
}

.ai-message {
  background: #f8f9fa;
  padding: 12px 16px;
  border-radius: 8px;
  margin: 8px 0;
  max-width: 80%;
  animation: fadeIn 0.3s ease-in;
}

@keyframes fadeIn {
  from { opacity: 0; transform: translateY(10px); }
  to { opacity: 1; transform: translateY(0); }
}

Abort Streaming

Allow users to stop generation mid-stream:

let currentAbortController = null;

async function sendMessage(userMessage) {
  // Cancel any ongoing request
  if (currentAbortController) {
    currentAbortController.abort();
  }

  currentAbortController = new AbortController();

  const response = await fetch('http://localhost:3000/chat', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ message: userMessage }),
    signal: currentAbortController.signal
  });

  // ... streaming logic
}

// Add stop button
document.getElementById('stop-btn').addEventListener('click', () => {
  if (currentAbortController) {
    currentAbortController.abort();
    currentAbortController = null;
  }
});

SSE vs WebSockets: Which to Choose?

Feature	Server-Sent Events (SSE)	WebSockets
Direction	Server → Client only	Bidirectional
Protocol	HTTP	ws:// or wss://
Auto-Reconnect	Built-in	Manual implementation
Complexity	Simple	More complex
Use Case	AI streaming, notifications	Real-time chat, games
Browser Support	Excellent (except IE)	Excellent

For AI chat applications, SSE is usually the better choice because:

You only need server-to-client streaming
Built-in reconnection logic
Works over standard HTTP/HTTPS
Simpler to implement and debug

When to Use WebSockets

Consider WebSockets when you need:

Real-time collaboration (multiple users editing)
Gaming applications
Truly bidirectional communication
Lower latency for frequent small messages

Performance Optimization Techniques

1. Debounce User Input

function debounce(func, wait) {
  let timeout;
  return function executedFunction(...args) {
    const later = () => {
      clearTimeout(timeout);
      func(...args);
    };
    clearTimeout(timeout);
    timeout = setTimeout(later, wait);
  };
}

const debouncedSend = debounce(sendMessage, 300);

2. Virtual Scrolling for Long Conversations

For chats with hundreds of messages, render only visible messages:

// Using a library like react-window or virtual-scroller
import { VirtualScroller } from 'virtual-scroller';

const scroller = new VirtualScroller({
  container: document.getElementById('chat-box'),
  items: messages,
  itemHeight: 100,
  renderItem: (message) => {
    const div = document.createElement('div');
    div.className = `message ${message.role}`;
    div.innerHTML = marked.parse(message.content);
    return div;
  }
});

3. Batch DOM Updates

async function streamWithBatching(userMessage) {
  const chatBox = document.getElementById('chat-box');
  const aiMessageElement = document.createElement('div');
  let accumulatedText = '';
  let batchBuffer = '';
  let lastUpdate = Date.now();

  // ... fetch logic

  for await (const chunk of stream) {
    batchBuffer += chunk.content;
    
    // Update UI every 50ms or when buffer reaches threshold
    if (Date.now() - lastUpdate > 50 || batchBuffer.length > 20) {
      accumulatedText += batchBuffer;
      aiMessageElement.innerHTML = marked.parse(accumulatedText);
      batchBuffer = '';
      lastUpdate = Date.now();
    }
  }

  // Final update
  if (batchBuffer) {
    accumulatedText += batchBuffer;
    aiMessageElement.innerHTML = marked.parse(accumulatedText);
  }
}

Testing Streaming Endpoints

Backend Testing with Jest

// __tests__/chat.test.js
import request from 'supertest';
import app from '../server.js';

describe('POST /chat', () => {
  it('should stream responses', async () => {
    const chunks = [];
    
    const response = await request(app)
      .post('/chat')
      .send({ message: 'Hello' })
      .buffer(false)
      .parse((res, callback) => {
        res.on('data', (chunk) => {
          chunks.push(chunk.toString());
        });
        res.on('end', () => {
          callback(null, chunks);
        });
      });

    expect(chunks.length).toBeGreaterThan(0);
    expect(chunks.some(c => c.includes('[DONE]'))).toBe(true);
  });

  it('should handle errors gracefully', async () => {
    // Mock OpenAI to throw error
    jest.spyOn(openai.chat.completions, 'create')
      .mockRejectedValue(new Error('API Error'));

    const response = await request(app)
      .post('/chat')
      .send({ message: 'Hello' });

    expect(response.status).toBe(200);
    expect(response.text).toContain('error');
  });
});

Frontend Testing with Testing Library

// __tests__/chat-ui.test.js
import { render, screen, waitFor } from '@testing-library/react';
import userEvent from '@testing-library/user-event';
import ChatComponent from '../ChatComponent';

// Mock fetch
global.fetch = jest.fn();

describe('ChatComponent', () => {
  beforeEach(() => {
    fetch.mockClear();
  });

  it('displays streamed messages', async () => {
    const mockStream = new ReadableStream({
      start(controller) {
        controller.enqueue('data: {"content":"Hello"}\n\n');
        controller.enqueue('data: {"content":" world"}\n\n');
        controller.enqueue('data: [DONE]\n\n');
        controller.close();
      }
    });

    fetch.mockResolvedValue({
      ok: true,
      body: mockStream
    });

    render(<ChatComponent />);
    
    const input = screen.getByRole('textbox');
    await userEvent.type(input, 'Hi');
    await userEvent.click(screen.getByRole('button', { name: /send/i }));

    await waitFor(() => {
      expect(screen.getByText(/Hello world/i)).toBeInTheDocument();
    });
  });
});

Complete Demo Project Structure

Here’s a production-ready project structure:

ai-chat-app/
├── backend/
│   ├── src/
│   │   ├── controllers/
│   │   │   └── chatController.js
│   │   ├── middleware/
│   │   │   ├── errorHandler.js
│   │   │   └── rateLimit.js
│   │   ├── services/
│   │   │   └── openaiService.js
│   │   ├── utils/
│   │   │   └── streamHelpers.js
│   │   └── server.js
│   ├── __tests__/
│   │   └── chat.test.js
│   ├── package.json
│   └── .env.example
├── frontend/
│   ├── src/
│   │   ├── components/
│   │   │   ├── ChatBox.js
│   │   │   ├── MessageList.js
│   │   │   ├── MessageInput.js
│   │   │   └── TypingIndicator.js
│   │   ├── services/
│   │   │   └── apiClient.js
│   │   ├── hooks/
│   │   │   └── useChat.js
│   │   ├── utils/
│   │   │   ├── conversationManager.js
│   │   │   └── streamParser.js
│   │   ├── styles/
│   │   │   └── chat.css
│   │   └── App.js
│   ├── __tests__/
│   │   └── ChatBox.test.js
│   └── package.json
└── README.md

Example: `useChat.js` Hook

// frontend/src/hooks/useChat.js
import { useState, useRef } from 'react';
import { ConversationManager } from '../utils/conversationManager';

export function useChat() {
  const [messages, setMessages] = useState([]);
  const [isStreaming, setIsStreaming] = useState(false);
  const conversationRef = useRef(new ConversationManager());
  const abortControllerRef = useRef(null);

  const sendMessage = async (content) => {
    if (isStreaming) return;

    const userMessage = { role: 'user', content, id: Date.now() };
    setMessages(prev => [...prev, userMessage]);
    conversationRef.current.addMessage('user', content);

    setIsStreaming(true);
    abortControllerRef.current = new AbortController();

    const aiMessage = { role: 'assistant', content: '', id: Date.now() + 1 };
    setMessages(prev => [...prev, aiMessage]);

    try {
      const response = await fetch('http://localhost:3000/chat', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          message: content,
          conversationHistory: conversationRef.current.getHistory()
        }),
        signal: abortControllerRef.current.signal
      });

      const reader = response.body.getReader();
      const decoder = new TextDecoder();
      let accumulatedContent = '';

      while (true) {
        const { done, value } = await reader.read();
        if (done) break;

        const chunk = decoder.decode(value);
        const lines = chunk.split('\n\n');

        for (const line of lines) {
          if (line.startsWith('data: ')) {
            const dataStr = line.replace('data: ', '');
            if (dataStr === '[DONE]') break;

            try {
              const data = JSON.parse(dataStr);
              accumulatedContent += data.content;
              
              setMessages(prev => 
                prev.map(msg => 
                  msg.id === aiMessage.id 
                    ? { ...msg, content: accumulatedContent }
                    : msg
                )
              );
            } catch (e) {
              console.error('Parse error:', e);
            }
          }
        }
      }

      conversationRef.current.addMessage('assistant', accumulatedContent);
    } catch (error) {
      if (error.name !== 'AbortError') {
        console.error('Streaming error:', error);
        setMessages(prev => 
          prev.map(msg => 
            msg.id === aiMessage.id 
              ? { ...msg, content: 'Error: Failed to get response', error: true }
              : msg
          )
        );
      }
    } finally {
      setIsStreaming(false);
      abortControllerRef.current = null;
    }
  };

  const stopStreaming = () => {
    if (abortControllerRef.current) {
      abortControllerRef.current.abort();
    }
  };

  const clearConversation = () => {
    setMessages([]);
    conversationRef.current.clear();
  };

  return {
    messages,
    isStreaming,
    sendMessage,
    stopStreaming,
    clearConversation
  };
}

Conclusion

Streaming APIs transform how users interact with AI applications. By reducing perceived latency and providing immediate feedback, you create experiences that feel more responsive and engaging.

Key takeaways:

SSE is ideal for AI chat - Simple, reliable, and purpose-built for server-to-client streaming
Handle edge cases - Network failures, markdown rendering, and conversation context are critical
Optimize performance - Batch DOM updates, debounce inputs, and consider virtual scrolling
Test thoroughly - Stream parsing is complex; comprehensive tests prevent production issues
Focus on UX - Typing indicators, smooth animations, and abort controls make the difference

The code patterns we’ve covered work across frameworks (React, Vue, Svelte) and AI providers (OpenAI, Anthropic, Google). Whether you’re building a customer support chatbot or an internal AI assistant, these techniques will help you deliver a polished, production-ready experience.

Ready to build? Start with the basic streaming example, then progressively enhance with error handling, markdown support, and conversation management. Your users will notice the difference.

Resources: