Building AI-powered features into web applications has become increasingly practical and affordable. Whether you’re adding intelligent chatbots, content generation, or analytical capabilities, integrating Large Language Models (LLMs) from providers like OpenAI and Anthropic can transform your application’s functionality.
This guide provides practical guidance for web developers implementing LLM APIs, covering everything from authentication to production-grade error handling and cost optimization.
Understanding the LLM API Landscape
OpenAI API Overview
OpenAI offers models through their chat completion API, with GPT-4 and GPT-3.5-turbo being the primary options:
- GPT-4: Most capable, best for complex reasoning, higher cost (~$0.03-0.06 per 1K tokens input, ~$0.06-0.12 per 1K tokens output)
- GPT-3.5-turbo: Fast and affordable, excellent for most tasks (~$0.50-0.15 per 1M tokens input, ~$0.60 per 1M tokens output)
Best for: General-purpose tasks, function calling, complex reasoning, established API ecosystem
Anthropic Claude API Overview
Anthropic provides Claude models with increasing versions (Claude 3 Haiku, Sonnet, Opus):
- Claude 3 Opus: Most advanced, best for complex tasks (~$0.015 per 1K tokens input, ~$0.075 per 1K tokens output)
- Claude 3 Sonnet: Balanced performance/cost (~$0.003 per 1K tokens input, ~$0.015 per 1K tokens output)
- Claude 3 Haiku: Fast and cheap (~$0.00025 per 1K tokens input, ~$0.00125 per 1K tokens output)
Best for: Long-context processing (200K tokens), safety-conscious applications, nuanced instruction following
Key Differences
| Aspect | OpenAI | Anthropic |
|---|---|---|
| Token Limit | 4K-128K tokens | Up to 200K tokens |
| Streaming | โ Yes | โ Yes |
| Function Calling | โ Native support | โ ๏ธ Via structured output |
| API Maturity | Mature, stable | Growing, solid |
| Regional Availability | Global | Global |
| Safety Features | Moderate | Strong constitutional AI focus |
| Cost (Entry) | Medium | Low to Very Low |
Authentication and API Key Management
Setting Up API Keys
OpenAI Setup:
- Create account at https://platform.openai.com
- Navigate to API keys section
- Generate new secret key
- Store securely (never commit to version control)
Anthropic Setup:
- Create account at https://console.anthropic.com
- Go to API keys
- Create new key
- Store securely
Secure Key Management
Never store API keys in code. Use environment variables:
# .env.local (never commit this file)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
// Load from environment
const openaiApiKey = process.env.OPENAI_API_KEY;
const anthropicApiKey = process.env.ANTHROPIC_API_KEY;
if (!openaiApiKey) {
throw new Error('OPENAI_API_KEY not set');
}
For production, use your platform’s secret management:
- AWS: AWS Secrets Manager or Parameter Store
- Vercel: Environment variables in project settings
- Docker: Use
--envor secrets files - Kubernetes: Use Secrets objects
Basic Integration: Request/Response Patterns
OpenAI Chat Completion
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
async function generateContent(prompt: string): Promise<string> {
const message = await client.chat.completions.create({
model: 'gpt-3.5-turbo',
messages: [
{
role: 'user',
content: prompt,
},
],
temperature: 0.7,
max_tokens: 500,
});
// Extract text from first completion choice
const content = message.choices[0]?.message?.content;
if (!content) {
throw new Error('No response from OpenAI');
}
return content;
}
// Usage
const response = await generateContent('Explain async/await in JavaScript');
console.log(response);
Anthropic Claude API
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});
async function generateContent(prompt: string): Promise<string> {
const message = await client.messages.create({
model: 'claude-3-sonnet-20240229',
max_tokens: 500,
messages: [
{
role: 'user',
content: prompt,
},
],
});
// Extract text from content block
const textBlock = message.content.find((block) => block.type === 'text');
if (!textBlock || textBlock.type !== 'text') {
throw new Error('No text response from Claude');
}
return textBlock.text;
}
// Usage
const response = await generateContent('Explain async/await in JavaScript');
console.log(response);
Maintaining Conversation Context
For multi-turn conversations, track message history:
interface Message {
role: 'user' | 'assistant';
content: string;
}
class ConversationManager {
private messages: Message[] = [];
addUserMessage(content: string): void {
this.messages.push({ role: 'user', content });
}
addAssistantMessage(content: string): void {
this.messages.push({ role: 'assistant', content });
}
async getResponse(client: OpenAI): Promise<string> {
const response = await client.chat.completions.create({
model: 'gpt-3.5-turbo',
messages: this.messages,
temperature: 0.7,
});
const assistantMessage =
response.choices[0]?.message?.content || '';
this.addAssistantMessage(assistantMessage);
return assistantMessage;
}
getHistory(): Message[] {
return this.messages;
}
clear(): void {
this.messages = [];
}
}
Handling Streaming Responses
Streaming provides real-time feedback, crucial for user experience:
OpenAI Streaming
async function* streamOpenAIResponse(
prompt: string,
client: OpenAI
): AsyncGenerator<string> {
const stream = client.chat.completions.stream({
model: 'gpt-3.5-turbo',
messages: [{ role: 'user', content: prompt }],
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) {
yield content;
}
}
}
// Usage in API route
export async function POST(request: Request) {
const { prompt } = await request.json();
const encoder = new TextEncoder();
const readable = new ReadableStream({
async start(controller) {
for await (const chunk of streamOpenAIResponse(
prompt,
openaiClient
)) {
controller.enqueue(encoder.encode(chunk));
}
controller.close();
},
});
return new Response(readable, {
headers: {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
},
});
}
Anthropic Streaming
async function* streamClaudeResponse(
prompt: string,
client: Anthropic
): AsyncGenerator<string> {
const stream = await client.messages.stream({
model: 'claude-3-sonnet-20240229',
max_tokens: 500,
messages: [{ role: 'user', content: prompt }],
});
for await (const event of stream) {
if (
event.type === 'content_block_delta' &&
event.delta.type === 'text_delta'
) {
yield event.delta.text;
}
}
}
Error Handling and Retry Logic
Production applications require robust error handling:
interface RetryConfig {
maxRetries: number;
delayMs: number;
backoffMultiplier: number;
}
async function callWithRetry<T>(
fn: () => Promise<T>,
config: RetryConfig = {
maxRetries: 3,
delayMs: 1000,
backoffMultiplier: 2,
}
): Promise<T> {
let lastError: Error | null = null;
let delay = config.delayMs;
for (let attempt = 0; attempt <= config.maxRetries; attempt++) {
try {
return await fn();
} catch (error) {
lastError = error as Error;
// Don't retry on certain errors
if (
error instanceof Error &&
(error.message.includes('401') ||
error.message.includes('403') ||
error.message.includes('invalid_request_error'))
) {
throw error;
}
if (attempt < config.maxRetries) {
console.log(
`Attempt ${attempt + 1} failed. Retrying in ${delay}ms...`
);
await new Promise((resolve) => setTimeout(resolve, delay));
delay *= config.backoffMultiplier;
}
}
}
throw lastError || new Error('Max retries exceeded');
}
// Usage
const response = await callWithRetry(() =>
client.chat.completions.create({
model: 'gpt-3.5-turbo',
messages: [{ role: 'user', content: 'Hello' }],
})
);
Cost Optimization Strategies
1. Prompt Caching (OpenAI)
// Cache system prompts that don't change
const cachedSystemPrompt = `You are a helpful assistant.
Your responses should be concise and accurate.
Always validate user input before processing.`;
async function generateWithCache(userMessage: string) {
return await client.chat.completions.create({
model: 'gpt-4-turbo',
messages: [
{
role: 'system',
content: cachedSystemPrompt,
},
{
role: 'user',
content: userMessage,
},
],
});
}
2. Token Management
import { encoding_for_model } from 'js-tiktoken';
function estimateTokens(text: string, model: string): number {
const encoding = encoding_for_model(model);
return encoding.encode(text).length;
}
function shouldUseCheaperModel(tokens: number): boolean {
// Use cheaper model for simple queries
return tokens < 500;
}
async function smartModelSelection(prompt: string) {
const tokens = estimateTokens(prompt, 'gpt-3.5-turbo');
const model =
shouldUseCheaperModel(tokens) ?
'gpt-3.5-turbo' :
'gpt-4-turbo';
return await client.chat.completions.create({
model,
messages: [{ role: 'user', content: prompt }],
});
}
3. Response Caching
const responseCache = new Map<string, string>();
const CACHE_TTL_MS = 3600000; // 1 hour
async function getCachedResponse(
prompt: string
): Promise<string> {
// Check cache
if (responseCache.has(prompt)) {
console.log('Cache hit');
return responseCache.get(prompt)!;
}
// Generate and cache
const response = await client.chat.completions.create({
model: 'gpt-3.5-turbo',
messages: [{ role: 'user', content: prompt }],
});
const content = response.choices[0]?.message?.content || '';
responseCache.set(prompt, content);
// Clear cache after TTL
setTimeout(() => responseCache.delete(prompt), CACHE_TTL_MS);
return content;
}
Security Best Practices
Input Sanitization
function sanitizeUserInput(input: string): string {
// Remove null bytes
let sanitized = input.replace(/\0/g, '');
// Limit length
if (sanitized.length > 10000) {
sanitized = sanitized.substring(0, 10000);
}
// Remove potentially dangerous patterns
sanitized = sanitized.replace(/<script[^>]*>.*?<\/script>/gi, '');
return sanitized.trim();
}
async function safeGenerate(userPrompt: string) {
const sanitized = sanitizeUserInput(userPrompt);
return await client.chat.completions.create({
model: 'gpt-3.5-turbo',
messages: [{ role: 'user', content: sanitized }],
});
}
Output Validation
function isValidResponse(response: string): boolean {
// Check for suspicious content
const suspiciousPatterns = [
/api[_-]key/i,
/password/i,
/secret/i,
];
return !suspiciousPatterns.some((pattern) =>
pattern.test(response)
);
}
async function getValidatedResponse(prompt: string) {
const response = await client.chat.completions.create({
model: 'gpt-3.5-turbo',
messages: [{ role: 'user', content: prompt }],
});
const content = response.choices[0]?.message?.content || '';
if (!isValidResponse(content)) {
console.warn('Suspicious response detected');
return 'Unable to process that request safely.';
}
return content;
}
Rate Limiting
import { RateLimiter } from 'limiter';
const limiter = new RateLimiter({
tokensPerInterval: 100,
interval: 'minute',
});
async function rateLimitedRequest(prompt: string) {
await limiter.removeTokens(1);
return await client.chat.completions.create({
model: 'gpt-3.5-turbo',
messages: [{ role: 'user', content: prompt }],
});
}
Practical Use Cases
Chat Interface
// Ideal for: Conversational AI, customer support
// Use: Claude 3 Sonnet or GPT-3.5-turbo
// Reason: Cost-effective, context retention
async function chatEndpoint(request: Request) {
const { messages } = await request.json();
const response = await client.chat.completions.create({
model: 'gpt-3.5-turbo',
messages,
temperature: 0.7,
});
return new Response(
JSON.stringify({ response: response.choices[0]?.message?.content }),
{ headers: { 'Content-Type': 'application/json' } }
);
}
Content Generation
// Ideal for: Blog posts, product descriptions, emails
// Use: GPT-4 or Claude 3 Opus
// Reason: Better quality output justifies higher cost
async function generateProductDescription(productName: string) {
return await client.chat.completions.create({
model: 'gpt-4-turbo',
messages: [
{
role: 'system',
content:
'Write compelling, SEO-friendly product descriptions.',
},
{
role: 'user',
content: `Generate a product description for: ${productName}`,
},
],
});
}
Code Analysis
// Ideal for: Code review, documentation generation
// Use: Claude 3 Opus or GPT-4
// Reason: Complex reasoning required
async function analyzeCode(code: string) {
return await anthropicClient.messages.create({
model: 'claude-3-opus-20240229',
max_tokens: 2000,
messages: [
{
role: 'user',
content: `Review this code and suggest improvements:\n\n${code}`,
},
],
});
}
Performance Optimization
Prompt Engineering
Write clear, specific prompts to reduce token usage:
// โ Inefficient
'Tell me about JavaScript'
// โ
Efficient
'Explain JavaScript closures with one code example'
// โ
Even better (Claude-specific)
'Explain JavaScript closures. Respond in 150 words or less.'
Batch Processing
For multiple requests, process efficiently:
async function batchProcess(items: string[]) {
// Process with concurrency limit
const batchSize = 10;
const results = [];
for (let i = 0; i < items.length; i += batchSize) {
const batch = items.slice(i, i + batchSize);
const batchResults = await Promise.all(
batch.map((item) =>
client.chat.completions.create({
model: 'gpt-3.5-turbo',
messages: [{ role: 'user', content: item }],
})
)
);
results.push(...batchResults);
}
return results;
}
Choosing Between OpenAI and Anthropic
Choose OpenAI GPT-4 when:
- You need function calling or tool use
- Established integrations matter
- You want cutting-edge capabilities
- Budget is not primary concern
Choose OpenAI GPT-3.5 when:
- Cost efficiency is important
- Task complexity is moderate
- You need proven, stable API
- Speed is critical
Choose Anthropic Claude when:
- Long context windows needed (200K tokens)
- Safety/compliance is crucial
- You need nuanced instruction following
- Cost optimization is key
- Constitutional AI approach appeals to you
Conclusion
Integrating LLMs into web applications requires careful consideration of API selection, cost management, and security practices. Start with the cheaper models for prototyping, test thoroughly before production deployment, and monitor costs closely.
Next Steps:
- Set up API keys with your chosen provider
- Build a simple prototype in your preferred framework
- Implement error handling and retry logic
- Monitor API usage and costs
- Gradually add advanced features like streaming and caching
- Consider moving to more sophisticated models as you optimize
The LLM API landscape continues evolving rapidly. Stay updated with provider documentation and community best practices as new models and features launch.
Comments