Why Server-Sent Events (SSE) Beat WebSockets for Real-Time AI Streaming

TL;DR

Real-time AI needs to be fast, but it rarely needs to be bi-directional. For streaming LLM tokens to a browser, Server-Sent Events (SSE) offer a simpler, HTTP-native alternative to the heavy handshake of WebSockets.

The "Socket Default"

When developers think "Real-Time," they think "WebSockets" (Socket.io, Pusher, etc.). It's the industry standard for chat apps, multiplayer games, and live dashboards.

But LLM interactions are different. They are primarily Request-Response, just with a very long, streamed response.

Key Insight

The Protocol Match: You send one prompt (Request). The AI sends back 500 tokens over 30 seconds (Stream). You rarely need to interrupt the AI mid-sentence with a bi-directional message.

Comparison: WebSockets vs SSE vs Long Polling

To understand why we chose SSE, let's look at the alternatives.

Feature	WebSockets	Server-Sent Events (SSE)	Long Polling
Direction	Bi-directional (Full Duplex)	Uni-directional (Server -> Client)	Uni-directional (Hack)
Protocol	TCP (Upgrade from HTTP)	HTTP (Standard)	HTTP (Repeated)
Reconnection	Manual logic required	Automatic (Native Browser)	Manual logic required
Firewall	Often blocked (Network policies)	Always allowed (Port 80/443)	Always allowed
Complexity	High (Handshakes, Heartbeats)	Low (Standard Request)	Medium (State management)
Statefulness	Stateful (Sticky connections)	Stateless (mostly)	Stateless

For a chat app where the user is typing while the other person is typing (Google Docs style), WebSockets are superior. But for AI Generation, where the user submits a job and waits for the output, SSE is the perfect architectural fit.

Why We Chose SSE for the Battle Arena

In AI Boss Battle (see the Software Architecture), the user uploads a file, and then three agents (Aggressor, Defender, Moderator) start talking. The user watches. They don't interrupt. They are spectators in the arena.

This "One-to-Many" broadcast pattern is the perfect use case for Server-Sent Events.

1. HTTP-Native Simplicity

WebSockets require "Upgrading" the connection. This often breaks in corporate environments with strict proxy rules that strip the Upgrade header. SSE is just a standard HTTP request with Content-Type: text/event-stream. It works everywhere standard web browsing works.

2. The Next.js / Vercel Edge

Vercel's Edge Network handles streaming HTTP responses natively. WebSockets on serverless functions usually require a third-party stateful server (like Pusher, Ably, or a dedicated Node process on EC2).

With SSE, we just return a Stream from our API route, and Vercel handles the connection limits and buffering.

Implementing SSE in Next.js 15

It's shockingly simple. Here is a production-ready implementation that handles encoding and stream closure.

// app/api/stream/route.ts
import { NextRequest } from 'next/server';

export async function GET(req: NextRequest) {
  const encoder = new TextEncoder();
  
  // Create a ReadableStream
  const stream = new ReadableStream({
    async start(controller) {
      // 1. Send initial event
      controller.enqueue(encoder.encode('event: start\ndata: {"status":"spawn"}\n\n'));
      
      try {
        // 2. Simulate/Run Agent Logic
        // In reality, this would be an AI SDK stream reader
        for (const token of agentStream) {
          // Format: "data: <payload>\n\n"
          controller.enqueue(encoder.encode(`data: ${JSON.stringify(token)}\n\n`));
          
          // Add artificial delay to simulate "typing" if needed
          await new Promise(r => setTimeout(r, 10));
        }
      } catch (err) {
        controller.enqueue(encoder.encode('event: error\ndata: "Stream failed"\n\n'));
      } finally {
        // 3. Close stream
        controller.enqueue(encoder.encode('event: done\ndata: {}\n\n'));
        controller.close();
      }
    }
  });

  // Return with specific headers
  return new Response(stream, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
      'Connection': 'keep-alive', // Critical for keeping socket open
      'X-Accel-Buffering': 'no', // Critical for Nginx proxies
    },
  });
}

The Client-Side Consumption

On the frontend, we don't even need a library. The browser's native EventSource API handles it. However, EventSource only supports GET requests. If we need to send a large prompt (like a whole file), we usually use fetch with a stream reader.

Here is the robust fetch version (which supports POST bodies):

async function fetchStream(url: string, body: any) {
  const response = await fetch(url, {
    method: 'POST',
    body: JSON.stringify(body),
  });
  
  const reader = response.body?.getReader();
  const decoder = new TextDecoder();
  
  if (!reader) return;

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    
    const chunk = decoder.decode(value);
    const lines = chunk.split('\n');
    
    for (const line of lines) {
        if (line.startsWith('data: ')) {
            const data = JSON.parse(line.slice(6));
             updateBattleState(data);
        }
    }
  }
}

Handling Reconnection and Errors

One advantage of the native EventSource API is that it automatically reconnects if the internet blips. It even sends a Last-Event-ID header so the server knows where to resume.

With fetch, you have to implement this retry logic yourself. For AI Boss Battle, we decided that a dropped connection = a failed battle. We simply show a "Network Error / Retry" button. The complexity of resuming a non-deterministic AI generation stream wasn't worth the engineering effort for an MVP.

Conclusion: Right Tool, Right Job

If you are building a real-time multiplayer game like Fortnite (UDP) or a collaborative editor like Figma (WebSockets), use the heavy protocols. You need that sub-millisecond bi-directional state sync.

But if you are just streaming text from a robot to a human? SSE is the way. It's lighter, simpler, debuggable with curl, and fits perfectly into the modern serverless stack.