Skip to main content
โšก Calmops

Integrating LLMs into Web Apps: OpenAI and Anthropic APIs

Building AI-powered features into web applications has become increasingly practical and affordable. Whether you’re adding intelligent chatbots, content generation, or analytical capabilities, integrating Large Language Models (LLMs) from providers like OpenAI and Anthropic can transform your application’s functionality.

This guide provides practical guidance for web developers implementing LLM APIs, covering everything from authentication to production-grade error handling and cost optimization.

Understanding the LLM API Landscape

OpenAI API Overview

OpenAI offers models through their chat completion API, with GPT-4 and GPT-3.5-turbo being the primary options:

  • GPT-4: Most capable, best for complex reasoning, higher cost (~$0.03-0.06 per 1K tokens input, ~$0.06-0.12 per 1K tokens output)
  • GPT-3.5-turbo: Fast and affordable, excellent for most tasks (~$0.50-0.15 per 1M tokens input, ~$0.60 per 1M tokens output)

Best for: General-purpose tasks, function calling, complex reasoning, established API ecosystem

Anthropic Claude API Overview

Anthropic provides Claude models with increasing versions (Claude 3 Haiku, Sonnet, Opus):

  • Claude 3 Opus: Most advanced, best for complex tasks (~$0.015 per 1K tokens input, ~$0.075 per 1K tokens output)
  • Claude 3 Sonnet: Balanced performance/cost (~$0.003 per 1K tokens input, ~$0.015 per 1K tokens output)
  • Claude 3 Haiku: Fast and cheap (~$0.00025 per 1K tokens input, ~$0.00125 per 1K tokens output)

Best for: Long-context processing (200K tokens), safety-conscious applications, nuanced instruction following

Key Differences

Aspect OpenAI Anthropic
Token Limit 4K-128K tokens Up to 200K tokens
Streaming โœ… Yes โœ… Yes
Function Calling โœ… Native support โš ๏ธ Via structured output
API Maturity Mature, stable Growing, solid
Regional Availability Global Global
Safety Features Moderate Strong constitutional AI focus
Cost (Entry) Medium Low to Very Low

Authentication and API Key Management

Setting Up API Keys

OpenAI Setup:

  1. Create account at https://platform.openai.com
  2. Navigate to API keys section
  3. Generate new secret key
  4. Store securely (never commit to version control)

Anthropic Setup:

  1. Create account at https://console.anthropic.com
  2. Go to API keys
  3. Create new key
  4. Store securely

Secure Key Management

Never store API keys in code. Use environment variables:

# .env.local (never commit this file)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
// Load from environment
const openaiApiKey = process.env.OPENAI_API_KEY;
const anthropicApiKey = process.env.ANTHROPIC_API_KEY;

if (!openaiApiKey) {
  throw new Error('OPENAI_API_KEY not set');
}

For production, use your platform’s secret management:

  • AWS: AWS Secrets Manager or Parameter Store
  • Vercel: Environment variables in project settings
  • Docker: Use --env or secrets files
  • Kubernetes: Use Secrets objects

Basic Integration: Request/Response Patterns

OpenAI Chat Completion

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

async function generateContent(prompt: string): Promise<string> {
  const message = await client.chat.completions.create({
    model: 'gpt-3.5-turbo',
    messages: [
      {
        role: 'user',
        content: prompt,
      },
    ],
    temperature: 0.7,
    max_tokens: 500,
  });

  // Extract text from first completion choice
  const content = message.choices[0]?.message?.content;
  if (!content) {
    throw new Error('No response from OpenAI');
  }

  return content;
}

// Usage
const response = await generateContent('Explain async/await in JavaScript');
console.log(response);

Anthropic Claude API

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

async function generateContent(prompt: string): Promise<string> {
  const message = await client.messages.create({
    model: 'claude-3-sonnet-20240229',
    max_tokens: 500,
    messages: [
      {
        role: 'user',
        content: prompt,
      },
    ],
  });

  // Extract text from content block
  const textBlock = message.content.find((block) => block.type === 'text');
  if (!textBlock || textBlock.type !== 'text') {
    throw new Error('No text response from Claude');
  }

  return textBlock.text;
}

// Usage
const response = await generateContent('Explain async/await in JavaScript');
console.log(response);

Maintaining Conversation Context

For multi-turn conversations, track message history:

interface Message {
  role: 'user' | 'assistant';
  content: string;
}

class ConversationManager {
  private messages: Message[] = [];

  addUserMessage(content: string): void {
    this.messages.push({ role: 'user', content });
  }

  addAssistantMessage(content: string): void {
    this.messages.push({ role: 'assistant', content });
  }

  async getResponse(client: OpenAI): Promise<string> {
    const response = await client.chat.completions.create({
      model: 'gpt-3.5-turbo',
      messages: this.messages,
      temperature: 0.7,
    });

    const assistantMessage =
      response.choices[0]?.message?.content || '';
    this.addAssistantMessage(assistantMessage);

    return assistantMessage;
  }

  getHistory(): Message[] {
    return this.messages;
  }

  clear(): void {
    this.messages = [];
  }
}

Handling Streaming Responses

Streaming provides real-time feedback, crucial for user experience:

OpenAI Streaming

async function* streamOpenAIResponse(
  prompt: string,
  client: OpenAI
): AsyncGenerator<string> {
  const stream = client.chat.completions.stream({
    model: 'gpt-3.5-turbo',
    messages: [{ role: 'user', content: prompt }],
  });

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content;
    if (content) {
      yield content;
    }
  }
}

// Usage in API route
export async function POST(request: Request) {
  const { prompt } = await request.json();

  const encoder = new TextEncoder();
  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of streamOpenAIResponse(
        prompt,
        openaiClient
      )) {
        controller.enqueue(encoder.encode(chunk));
      }
      controller.close();
    },
  });

  return new Response(readable, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
    },
  });
}

Anthropic Streaming

async function* streamClaudeResponse(
  prompt: string,
  client: Anthropic
): AsyncGenerator<string> {
  const stream = await client.messages.stream({
    model: 'claude-3-sonnet-20240229',
    max_tokens: 500,
    messages: [{ role: 'user', content: prompt }],
  });

  for await (const event of stream) {
    if (
      event.type === 'content_block_delta' &&
      event.delta.type === 'text_delta'
    ) {
      yield event.delta.text;
    }
  }
}

Error Handling and Retry Logic

Production applications require robust error handling:

interface RetryConfig {
  maxRetries: number;
  delayMs: number;
  backoffMultiplier: number;
}

async function callWithRetry<T>(
  fn: () => Promise<T>,
  config: RetryConfig = {
    maxRetries: 3,
    delayMs: 1000,
    backoffMultiplier: 2,
  }
): Promise<T> {
  let lastError: Error | null = null;
  let delay = config.delayMs;

  for (let attempt = 0; attempt <= config.maxRetries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      lastError = error as Error;

      // Don't retry on certain errors
      if (
        error instanceof Error &&
        (error.message.includes('401') ||
          error.message.includes('403') ||
          error.message.includes('invalid_request_error'))
      ) {
        throw error;
      }

      if (attempt < config.maxRetries) {
        console.log(
          `Attempt ${attempt + 1} failed. Retrying in ${delay}ms...`
        );
        await new Promise((resolve) => setTimeout(resolve, delay));
        delay *= config.backoffMultiplier;
      }
    }
  }

  throw lastError || new Error('Max retries exceeded');
}

// Usage
const response = await callWithRetry(() =>
  client.chat.completions.create({
    model: 'gpt-3.5-turbo',
    messages: [{ role: 'user', content: 'Hello' }],
  })
);

Cost Optimization Strategies

1. Prompt Caching (OpenAI)

// Cache system prompts that don't change
const cachedSystemPrompt = `You are a helpful assistant.
Your responses should be concise and accurate.
Always validate user input before processing.`;

async function generateWithCache(userMessage: string) {
  return await client.chat.completions.create({
    model: 'gpt-4-turbo',
    messages: [
      {
        role: 'system',
        content: cachedSystemPrompt,
      },
      {
        role: 'user',
        content: userMessage,
      },
    ],
  });
}

2. Token Management

import { encoding_for_model } from 'js-tiktoken';

function estimateTokens(text: string, model: string): number {
  const encoding = encoding_for_model(model);
  return encoding.encode(text).length;
}

function shouldUseCheaperModel(tokens: number): boolean {
  // Use cheaper model for simple queries
  return tokens < 500;
}

async function smartModelSelection(prompt: string) {
  const tokens = estimateTokens(prompt, 'gpt-3.5-turbo');
  const model =
    shouldUseCheaperModel(tokens) ?
      'gpt-3.5-turbo' :
      'gpt-4-turbo';

  return await client.chat.completions.create({
    model,
    messages: [{ role: 'user', content: prompt }],
  });
}

3. Response Caching

const responseCache = new Map<string, string>();
const CACHE_TTL_MS = 3600000; // 1 hour

async function getCachedResponse(
  prompt: string
): Promise<string> {
  // Check cache
  if (responseCache.has(prompt)) {
    console.log('Cache hit');
    return responseCache.get(prompt)!;
  }

  // Generate and cache
  const response = await client.chat.completions.create({
    model: 'gpt-3.5-turbo',
    messages: [{ role: 'user', content: prompt }],
  });

  const content = response.choices[0]?.message?.content || '';
  responseCache.set(prompt, content);

  // Clear cache after TTL
  setTimeout(() => responseCache.delete(prompt), CACHE_TTL_MS);

  return content;
}

Security Best Practices

Input Sanitization

function sanitizeUserInput(input: string): string {
  // Remove null bytes
  let sanitized = input.replace(/\0/g, '');

  // Limit length
  if (sanitized.length > 10000) {
    sanitized = sanitized.substring(0, 10000);
  }

  // Remove potentially dangerous patterns
  sanitized = sanitized.replace(/<script[^>]*>.*?<\/script>/gi, '');

  return sanitized.trim();
}

async function safeGenerate(userPrompt: string) {
  const sanitized = sanitizeUserInput(userPrompt);

  return await client.chat.completions.create({
    model: 'gpt-3.5-turbo',
    messages: [{ role: 'user', content: sanitized }],
  });
}

Output Validation

function isValidResponse(response: string): boolean {
  // Check for suspicious content
  const suspiciousPatterns = [
    /api[_-]key/i,
    /password/i,
    /secret/i,
  ];

  return !suspiciousPatterns.some((pattern) =>
    pattern.test(response)
  );
}

async function getValidatedResponse(prompt: string) {
  const response = await client.chat.completions.create({
    model: 'gpt-3.5-turbo',
    messages: [{ role: 'user', content: prompt }],
  });

  const content = response.choices[0]?.message?.content || '';

  if (!isValidResponse(content)) {
    console.warn('Suspicious response detected');
    return 'Unable to process that request safely.';
  }

  return content;
}

Rate Limiting

import { RateLimiter } from 'limiter';

const limiter = new RateLimiter({
  tokensPerInterval: 100,
  interval: 'minute',
});

async function rateLimitedRequest(prompt: string) {
  await limiter.removeTokens(1);

  return await client.chat.completions.create({
    model: 'gpt-3.5-turbo',
    messages: [{ role: 'user', content: prompt }],
  });
}

Practical Use Cases

Chat Interface

// Ideal for: Conversational AI, customer support
// Use: Claude 3 Sonnet or GPT-3.5-turbo
// Reason: Cost-effective, context retention

async function chatEndpoint(request: Request) {
  const { messages } = await request.json();

  const response = await client.chat.completions.create({
    model: 'gpt-3.5-turbo',
    messages,
    temperature: 0.7,
  });

  return new Response(
    JSON.stringify({ response: response.choices[0]?.message?.content }),
    { headers: { 'Content-Type': 'application/json' } }
  );
}

Content Generation

// Ideal for: Blog posts, product descriptions, emails
// Use: GPT-4 or Claude 3 Opus
// Reason: Better quality output justifies higher cost

async function generateProductDescription(productName: string) {
  return await client.chat.completions.create({
    model: 'gpt-4-turbo',
    messages: [
      {
        role: 'system',
        content:
          'Write compelling, SEO-friendly product descriptions.',
      },
      {
        role: 'user',
        content: `Generate a product description for: ${productName}`,
      },
    ],
  });
}

Code Analysis

// Ideal for: Code review, documentation generation
// Use: Claude 3 Opus or GPT-4
// Reason: Complex reasoning required

async function analyzeCode(code: string) {
  return await anthropicClient.messages.create({
    model: 'claude-3-opus-20240229',
    max_tokens: 2000,
    messages: [
      {
        role: 'user',
        content: `Review this code and suggest improvements:\n\n${code}`,
      },
    ],
  });
}

Performance Optimization

Prompt Engineering

Write clear, specific prompts to reduce token usage:

// โŒ Inefficient
'Tell me about JavaScript'

// โœ… Efficient
'Explain JavaScript closures with one code example'

// โœ… Even better (Claude-specific)
'Explain JavaScript closures. Respond in 150 words or less.'

Batch Processing

For multiple requests, process efficiently:

async function batchProcess(items: string[]) {
  // Process with concurrency limit
  const batchSize = 10;
  const results = [];

  for (let i = 0; i < items.length; i += batchSize) {
    const batch = items.slice(i, i + batchSize);
    const batchResults = await Promise.all(
      batch.map((item) =>
        client.chat.completions.create({
          model: 'gpt-3.5-turbo',
          messages: [{ role: 'user', content: item }],
        })
      )
    );
    results.push(...batchResults);
  }

  return results;
}

Choosing Between OpenAI and Anthropic

Choose OpenAI GPT-4 when:

  • You need function calling or tool use
  • Established integrations matter
  • You want cutting-edge capabilities
  • Budget is not primary concern

Choose OpenAI GPT-3.5 when:

  • Cost efficiency is important
  • Task complexity is moderate
  • You need proven, stable API
  • Speed is critical

Choose Anthropic Claude when:

  • Long context windows needed (200K tokens)
  • Safety/compliance is crucial
  • You need nuanced instruction following
  • Cost optimization is key
  • Constitutional AI approach appeals to you

Conclusion

Integrating LLMs into web applications requires careful consideration of API selection, cost management, and security practices. Start with the cheaper models for prototyping, test thoroughly before production deployment, and monitor costs closely.

Next Steps:

  1. Set up API keys with your chosen provider
  2. Build a simple prototype in your preferred framework
  3. Implement error handling and retry logic
  4. Monitor API usage and costs
  5. Gradually add advanced features like streaming and caching
  6. Consider moving to more sophisticated models as you optimize

The LLM API landscape continues evolving rapidly. Stay updated with provider documentation and community best practices as new models and features launch.

Comments