In the era of Generative AI, users expect chat interfaces to feel alive. They don’t want to stare at a loading spinner for ten seconds while a Large Language Model (LLM) generates a complete paragraph. They want to see the text appear token by token, just like a human typing.
This “typing effect” isn’t just a UI trick; it’s a fundamental shift in how we handle data using Streaming APIs. In this guide, we’ll explore how to build a real-time AI chat application using JavaScript and Server-Sent Events (SSE).
The Problem: Request/Response vs. Streaming
Traditionally, web APIs work on a Request/Response model:
- Client sends a prompt.
- Server processes the entire prompt.
- Server sends back the entire response.
With LLMs, step 2 can take a long time. If an answer is 500 words long, the user waits for the whole generation.
Streaming changes this:
- Client sends a prompt.
- Server starts processing.
- As soon as the server generates a “chunk” (a token or word), it sends it immediately.
- The client renders chunks as they arrive.
The Tech Stack
- Frontend: Vanilla JavaScript (or React/Vue/Svelte)
- Backend: Node.js (Express or Edge Functions)
- Protocol: Server-Sent Events (SSE)
- AI Provider: OpenAI API (or Anthropic/Gemini)
Step 1: The Backend (Node.js)
We need an endpoint that doesn’t close the connection immediately. We will use the OpenAI Node SDK, which supports streaming out of the box.
// server.js
import express from 'express';
import OpenAI from 'openai';
import cors from 'cors';
const app = express();
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
app.use(cors());
app.use(express.json());
app.post('/chat', async (req, res) => {
const { message, conversationHistory = [] } = req.body;
// Set headers for Server-Sent Events
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
try {
// Build messages array with conversation history
const messages = [
...conversationHistory,
{ role: 'user', content: message }
];
const stream = await openai.chat.completions.create({
model: 'gpt-4',
messages: messages,
stream: true, // This is crucial!
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || '';
if (content) {
// Format data as SSE
res.write(`data: ${JSON.stringify({ content })}\n\n`);
}
}
res.write('data: [DONE]\n\n');
res.end();
} catch (error) {
console.error('Error:', error);
res.write(`data: ${JSON.stringify({ error: error.message })}\n\n`);
res.end();
}
});
app.listen(3000, () => console.log('Server running on port 3000'));
Key Takeaways
- Headers:
Content-Type: text/event-streamtells the browser to keep the connection open. - Looping: We iterate over the
streamobject provided by the SDK. - Formatting: SSE requires messages to start with
data:and end with\n\n. - Context: We include
conversationHistoryto maintain context across messages.
Step 2: The Frontend (Client-Side JavaScript)
On the client, we can’t use a standard await fetch(). We need to read the ReadableStream returned by the fetch API.
// client.js
async function sendMessage(userMessage) {
const chatBox = document.getElementById('chat-box');
// Create a placeholder for the AI response
const aiMessageElement = document.createElement('div');
aiMessageElement.className = 'ai-message';
chatBox.appendChild(aiMessageElement);
const response = await fetch('http://localhost:3000/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
message: userMessage,
conversationHistory: getConversationHistory()
}),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const dataStr = line.replace('data: ', '');
if (dataStr === '[DONE]') return;
try {
const data = JSON.parse(dataStr);
// Append the new token to the UI
aiMessageElement.textContent += data.content;
// Auto-scroll to bottom
chatBox.scrollTop = chatBox.scrollHeight;
} catch (e) {
console.error('Error parsing JSON', e);
}
}
}
}
}
How it works
response.body.getReader(): This locks the stream to a reader.reader.read(): Reads the next available chunk of binary data.TextDecoder: Converts binary data into a string.- Parsing: We strip the
data:prefix and parse the JSON to get the actual text content.
Handling Edge Cases
When building production apps, consider these challenges:
1. Markdown Rendering
Raw text looks boring. Use a library like marked to render markdown as the text streams in:
import { marked } from 'marked';
async function sendMessage(userMessage) {
const chatBox = document.getElementById('chat-box');
const aiMessageElement = document.createElement('div');
aiMessageElement.className = 'ai-message';
chatBox.appendChild(aiMessageElement);
let accumulatedText = '';
const response = await fetch('http://localhost:3000/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ message: userMessage }),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const dataStr = line.replace('data: ', '');
if (dataStr === '[DONE]') {
// Final render with complete markdown
aiMessageElement.innerHTML = marked.parse(accumulatedText);
return;
}
try {
const data = JSON.parse(dataStr);
accumulatedText += data.content;
// Render markdown incrementally
aiMessageElement.innerHTML = marked.parse(accumulatedText);
chatBox.scrollTop = chatBox.scrollHeight;
} catch (e) {
console.error('Error parsing JSON', e);
}
}
}
}
}
2. Network Interruptions & Retry Logic
async function sendMessageWithRetry(userMessage, maxRetries = 3) {
let attempt = 0;
while (attempt < maxRetries) {
try {
await sendMessage(userMessage);
return; // Success
} catch (error) {
attempt++;
console.error(`Attempt ${attempt} failed:`, error);
if (attempt >= maxRetries) {
showError('Failed to get response. Please try again.');
throw error;
}
// Exponential backoff
await new Promise(resolve => setTimeout(resolve, Math.pow(2, attempt) * 1000));
}
}
}
function showError(message) {
const chatBox = document.getElementById('chat-box');
const errorElement = document.createElement('div');
errorElement.className = 'error-message';
errorElement.textContent = message;
chatBox.appendChild(errorElement);
}
3. Managing Conversation Context
class ConversationManager {
constructor(maxTokens = 4000) {
this.messages = [];
this.maxTokens = maxTokens;
}
addMessage(role, content) {
this.messages.push({ role, content });
this.trimIfNeeded();
}
getHistory() {
return this.messages;
}
trimIfNeeded() {
// Simple token estimation (4 chars โ 1 token)
const estimatedTokens = this.messages.reduce((sum, msg) =>
sum + msg.content.length / 4, 0
);
if (estimatedTokens > this.maxTokens) {
// Keep system message and remove oldest user/assistant pairs
const systemMessages = this.messages.filter(m => m.role === 'system');
const otherMessages = this.messages.filter(m => m.role !== 'system');
// Remove oldest messages but keep recent context
const messagesToKeep = otherMessages.slice(-10);
this.messages = [...systemMessages, ...messagesToKeep];
}
}
clear() {
this.messages = [];
}
}
// Usage
const conversation = new ConversationManager();
async function sendMessage(userMessage) {
conversation.addMessage('user', userMessage);
const response = await fetch('http://localhost:3000/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
message: userMessage,
conversationHistory: conversation.getHistory()
}),
});
// ... handle streaming
// After receiving complete response
conversation.addMessage('assistant', completeResponse);
}
UI/UX Best Practices for Chat Interfaces
Creating a great chat experience goes beyond just displaying text:
Visual Feedback
function showTypingIndicator() {
const chatBox = document.getElementById('chat-box');
const indicator = document.createElement('div');
indicator.className = 'typing-indicator';
indicator.innerHTML = `
<span></span>
<span></span>
<span></span>
`;
indicator.id = 'typing-indicator';
chatBox.appendChild(indicator);
}
function hideTypingIndicator() {
document.getElementById('typing-indicator')?.remove();
}
async function sendMessage(userMessage) {
showTypingIndicator();
try {
// ... streaming logic
} finally {
hideTypingIndicator();
}
}
CSS for Typing Effect
.typing-indicator {
display: flex;
gap: 4px;
padding: 12px;
background: #f0f0f0;
border-radius: 8px;
width: fit-content;
}
.typing-indicator span {
width: 8px;
height: 8px;
background: #999;
border-radius: 50%;
animation: bounce 1.4s infinite ease-in-out;
}
.typing-indicator span:nth-child(1) {
animation-delay: -0.32s;
}
.typing-indicator span:nth-child(2) {
animation-delay: -0.16s;
}
@keyframes bounce {
0%, 80%, 100% { transform: scale(0); }
40% { transform: scale(1); }
}
.ai-message {
background: #f8f9fa;
padding: 12px 16px;
border-radius: 8px;
margin: 8px 0;
max-width: 80%;
animation: fadeIn 0.3s ease-in;
}
@keyframes fadeIn {
from { opacity: 0; transform: translateY(10px); }
to { opacity: 1; transform: translateY(0); }
}
Abort Streaming
Allow users to stop generation mid-stream:
let currentAbortController = null;
async function sendMessage(userMessage) {
// Cancel any ongoing request
if (currentAbortController) {
currentAbortController.abort();
}
currentAbortController = new AbortController();
const response = await fetch('http://localhost:3000/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ message: userMessage }),
signal: currentAbortController.signal
});
// ... streaming logic
}
// Add stop button
document.getElementById('stop-btn').addEventListener('click', () => {
if (currentAbortController) {
currentAbortController.abort();
currentAbortController = null;
}
});
SSE vs WebSockets: Which to Choose?
| Feature | Server-Sent Events (SSE) | WebSockets |
|---|---|---|
| Direction | Server โ Client only | Bidirectional |
| Protocol | HTTP | ws:// or wss:// |
| Auto-Reconnect | Built-in | Manual implementation |
| Complexity | Simple | More complex |
| Use Case | AI streaming, notifications | Real-time chat, games |
| Browser Support | Excellent (except IE) | Excellent |
For AI chat applications, SSE is usually the better choice because:
- You only need server-to-client streaming
- Built-in reconnection logic
- Works over standard HTTP/HTTPS
- Simpler to implement and debug
When to Use WebSockets
Consider WebSockets when you need:
- Real-time collaboration (multiple users editing)
- Gaming applications
- Truly bidirectional communication
- Lower latency for frequent small messages
Performance Optimization Techniques
1. Debounce User Input
function debounce(func, wait) {
let timeout;
return function executedFunction(...args) {
const later = () => {
clearTimeout(timeout);
func(...args);
};
clearTimeout(timeout);
timeout = setTimeout(later, wait);
};
}
const debouncedSend = debounce(sendMessage, 300);
2. Virtual Scrolling for Long Conversations
For chats with hundreds of messages, render only visible messages:
// Using a library like react-window or virtual-scroller
import { VirtualScroller } from 'virtual-scroller';
const scroller = new VirtualScroller({
container: document.getElementById('chat-box'),
items: messages,
itemHeight: 100,
renderItem: (message) => {
const div = document.createElement('div');
div.className = `message ${message.role}`;
div.innerHTML = marked.parse(message.content);
return div;
}
});
3. Batch DOM Updates
async function streamWithBatching(userMessage) {
const chatBox = document.getElementById('chat-box');
const aiMessageElement = document.createElement('div');
let accumulatedText = '';
let batchBuffer = '';
let lastUpdate = Date.now();
// ... fetch logic
for await (const chunk of stream) {
batchBuffer += chunk.content;
// Update UI every 50ms or when buffer reaches threshold
if (Date.now() - lastUpdate > 50 || batchBuffer.length > 20) {
accumulatedText += batchBuffer;
aiMessageElement.innerHTML = marked.parse(accumulatedText);
batchBuffer = '';
lastUpdate = Date.now();
}
}
// Final update
if (batchBuffer) {
accumulatedText += batchBuffer;
aiMessageElement.innerHTML = marked.parse(accumulatedText);
}
}
Testing Streaming Endpoints
Backend Testing with Jest
// __tests__/chat.test.js
import request from 'supertest';
import app from '../server.js';
describe('POST /chat', () => {
it('should stream responses', async () => {
const chunks = [];
const response = await request(app)
.post('/chat')
.send({ message: 'Hello' })
.buffer(false)
.parse((res, callback) => {
res.on('data', (chunk) => {
chunks.push(chunk.toString());
});
res.on('end', () => {
callback(null, chunks);
});
});
expect(chunks.length).toBeGreaterThan(0);
expect(chunks.some(c => c.includes('[DONE]'))).toBe(true);
});
it('should handle errors gracefully', async () => {
// Mock OpenAI to throw error
jest.spyOn(openai.chat.completions, 'create')
.mockRejectedValue(new Error('API Error'));
const response = await request(app)
.post('/chat')
.send({ message: 'Hello' });
expect(response.status).toBe(200);
expect(response.text).toContain('error');
});
});
Frontend Testing with Testing Library
// __tests__/chat-ui.test.js
import { render, screen, waitFor } from '@testing-library/react';
import userEvent from '@testing-library/user-event';
import ChatComponent from '../ChatComponent';
// Mock fetch
global.fetch = jest.fn();
describe('ChatComponent', () => {
beforeEach(() => {
fetch.mockClear();
});
it('displays streamed messages', async () => {
const mockStream = new ReadableStream({
start(controller) {
controller.enqueue('data: {"content":"Hello"}\n\n');
controller.enqueue('data: {"content":" world"}\n\n');
controller.enqueue('data: [DONE]\n\n');
controller.close();
}
});
fetch.mockResolvedValue({
ok: true,
body: mockStream
});
render(<ChatComponent />);
const input = screen.getByRole('textbox');
await userEvent.type(input, 'Hi');
await userEvent.click(screen.getByRole('button', { name: /send/i }));
await waitFor(() => {
expect(screen.getByText(/Hello world/i)).toBeInTheDocument();
});
});
});
Complete Demo Project Structure
Here’s a production-ready project structure:
ai-chat-app/
โโโ backend/
โ โโโ src/
โ โ โโโ controllers/
โ โ โ โโโ chatController.js
โ โ โโโ middleware/
โ โ โ โโโ errorHandler.js
โ โ โ โโโ rateLimit.js
โ โ โโโ services/
โ โ โ โโโ openaiService.js
โ โ โโโ utils/
โ โ โ โโโ streamHelpers.js
โ โ โโโ server.js
โ โโโ __tests__/
โ โ โโโ chat.test.js
โ โโโ package.json
โ โโโ .env.example
โโโ frontend/
โ โโโ src/
โ โ โโโ components/
โ โ โ โโโ ChatBox.js
โ โ โ โโโ MessageList.js
โ โ โ โโโ MessageInput.js
โ โ โ โโโ TypingIndicator.js
โ โ โโโ services/
โ โ โ โโโ apiClient.js
โ โ โโโ hooks/
โ โ โ โโโ useChat.js
โ โ โโโ utils/
โ โ โ โโโ conversationManager.js
โ โ โ โโโ streamParser.js
โ โ โโโ styles/
โ โ โ โโโ chat.css
โ โ โโโ App.js
โ โโโ __tests__/
โ โ โโโ ChatBox.test.js
โ โโโ package.json
โโโ README.md
Example: useChat.js Hook
// frontend/src/hooks/useChat.js
import { useState, useRef } from 'react';
import { ConversationManager } from '../utils/conversationManager';
export function useChat() {
const [messages, setMessages] = useState([]);
const [isStreaming, setIsStreaming] = useState(false);
const conversationRef = useRef(new ConversationManager());
const abortControllerRef = useRef(null);
const sendMessage = async (content) => {
if (isStreaming) return;
const userMessage = { role: 'user', content, id: Date.now() };
setMessages(prev => [...prev, userMessage]);
conversationRef.current.addMessage('user', content);
setIsStreaming(true);
abortControllerRef.current = new AbortController();
const aiMessage = { role: 'assistant', content: '', id: Date.now() + 1 };
setMessages(prev => [...prev, aiMessage]);
try {
const response = await fetch('http://localhost:3000/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
message: content,
conversationHistory: conversationRef.current.getHistory()
}),
signal: abortControllerRef.current.signal
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let accumulatedContent = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const dataStr = line.replace('data: ', '');
if (dataStr === '[DONE]') break;
try {
const data = JSON.parse(dataStr);
accumulatedContent += data.content;
setMessages(prev =>
prev.map(msg =>
msg.id === aiMessage.id
? { ...msg, content: accumulatedContent }
: msg
)
);
} catch (e) {
console.error('Parse error:', e);
}
}
}
}
conversationRef.current.addMessage('assistant', accumulatedContent);
} catch (error) {
if (error.name !== 'AbortError') {
console.error('Streaming error:', error);
setMessages(prev =>
prev.map(msg =>
msg.id === aiMessage.id
? { ...msg, content: 'Error: Failed to get response', error: true }
: msg
)
);
}
} finally {
setIsStreaming(false);
abortControllerRef.current = null;
}
};
const stopStreaming = () => {
if (abortControllerRef.current) {
abortControllerRef.current.abort();
}
};
const clearConversation = () => {
setMessages([]);
conversationRef.current.clear();
};
return {
messages,
isStreaming,
sendMessage,
stopStreaming,
clearConversation
};
}
Conclusion
Streaming APIs transform how users interact with AI applications. By reducing perceived latency and providing immediate feedback, you create experiences that feel more responsive and engaging.
Key takeaways:
- SSE is ideal for AI chat - Simple, reliable, and purpose-built for server-to-client streaming
- Handle edge cases - Network failures, markdown rendering, and conversation context are critical
- Optimize performance - Batch DOM updates, debounce inputs, and consider virtual scrolling
- Test thoroughly - Stream parsing is complex; comprehensive tests prevent production issues
- Focus on UX - Typing indicators, smooth animations, and abort controls make the difference
The code patterns we’ve covered work across frameworks (React, Vue, Svelte) and AI providers (OpenAI, Anthropic, Google). Whether you’re building a customer support chatbot or an internal AI assistant, these techniques will help you deliver a polished, production-ready experience.
Ready to build? Start with the basic streaming example, then progressively enhance with error handling, markdown support, and conversation management. Your users will notice the difference.
Resources:
Comments