Back to Intel

Rendering User Markdown Safely in 2026: The Sandbox Strategy

How to let users upload Markdown files without opening yourself up to XSS attacks. A guide to MDX-Remote, sanitization, and the Shadow DOM.

Engineering
Rendering User Markdown Safely in 2026: The Sandbox Strategy
Security First

Untrusted input is the #1 vector for attacks. Allowing users to upload Markdown files and then rendering them as HTML is inherently risky. This post details the sanitization pipeline we use to ensure the Battle Arena is safe.

The Challenge: Rich Text vs. XSS

The core value prop of AI Boss Battle (outlined in our Manifesto) is that we take your text files (.md, .txt) and make them better. To show you the result, we have to render that Markdown in the browser.

But Markdown isn't just bold text and lists. "Github Flavored Markdown" is a superset of HTML. This means standard Markdown parsers will happily render:

<div onclick="alert('Stealing cookies...')">Click me</div>

If we just rendered the user's file directly, we would be building a Stored XSS (Cross-Site Scripting) vulnerability. An attacker could upload a "blog post" that, when viewed by an admin, steals their session token.

Key Insight

The Paradox: We want to support "Rich Features" (tables, callouts, diagrams) but blocking "Active Features" (scripts, event handlers, external objects).

The Stack: next-mdx-remote + rehype

We use next-mdx-remote on the Server Side to parse the content.

Why Server Side?

  1. Computation: Markdown parsing is expensive (RegEx heavy). We don't want to block the main thread on the client.
  2. Security: We can ensure that the sanitization libraries are loaded and enforced in a trusted environment. The client can't bypass a sanitizer that runs on the server.

Layer 1: The Import Filter

We strictly define which components are allowed to be rendered. If a user tries to inject a custom React component (e.g., <SecretAdminPanel />) that isn't in our whitelist, the MDX parser simply treats it as text or ignores it.

// components/mdx/index.tsx
import { Callout } from './Callout';
import { StatsGrid } from './StatsGrid';

const components = {
  // Only allow benign components
  Callout: Callout,
  StatsGrid: StatsGrid,
  
  // Override standard HTML tags
  img: (props) => (
    // Force lazy loading and sandbox external images
    <img {...props} loading="lazy" referrerPolicy="no-referrer" />
  ),
  a: (props) => (
    // Force all links to open in new tab with noopener
    <a {...props} target="_blank" rel="noopener noreferrer" className="text-blue-500 hover:underline" />
  )
}

Layer 2: The Rehype Chain

Before the Markdown is converted to React components, it passes through a remark (Markdown AST) and rehype (HTML AST) plugin chain.

This is our actual configuration:

import { serialize } from 'next-mdx-remote/serialize';
import rehypeSanitize, { defaultSchema } from 'rehype-sanitize';
import remarkGfm from 'remark-gfm';

export async function parseMDX(source: string) {
  return await serialize(source, {
    mdxOptions: {
      remarkPlugins: [remarkGfm], // Enable Tables, Strikethrough
      rehypePlugins: [
        [
          rehypeSanitize,
          {
            ...defaultSchema,
            attributes: {
              ...defaultSchema.attributes,
              // Allow 'className' for Tailwind styling
              '*': ['className', 'style'], 
            },
            // Strictly forbid dangerous tags even if they slip through remark
            tagNames: defaultSchema.tagNames.filter(
              (tag) => !['script', 'iframe', 'object', 'embed', 'form'].includes(tag)
            ),
          },
        ],
      ],
    },
  });
}

rehype-sanitize is the hero here. It walks the Abstract Syntax Tree (AST) and physically removes any node that doesn't match the schema. It doesn't just hide them with CSS; they are gone from the DOM.

Layer 3: CSP (Content Security Policy)

As a final fail-safe ("Defense in Depth"), our application headers enforce a strict Content Security Policy. This is configured in next.config.js.

const cspHeader = `
    default-src 'self';
    script-src 'self' 'unsafe-eval' 'unsafe-inline' https://va.vercel-scripts.com;
    style-src 'self' 'unsafe-inline';
    img-src 'self' blob: data:;
    font-src 'self';
    object-src 'none';
    base-uri 'self';
    form-action 'self';
    frame-ancestors 'none';
    upgrade-insecure-requests;
`
  • object-src 'none': Prevents Flash/Java applets.
  • frame-ancestors 'none': Prevents Clickjacking (our site cannot be embedded in an iframe).
  • script-src: Only allows scripts from our own domain and Vercel Analytics.

Even if a hacker found a 0-day vulnerability in next-mdx-remote and managed to render a <script> tag, the browser would look at this CSP header, see that the script isn't from a trusted source, and refuse to run it.

Layer 4: The "Sandboxed" Iframe (Optional)

For extremely high-risk content (like allowing users to write custom JS/HTML code), we would use a Sandboxed Iframe. <iframe sandbox="allow-scripts" src="..."> creates a completely separate origin. The code inside cannot access the cookies or local storage of the parent site.

For AI Boss Battle, we deemed this overkill since we only support Markdown, but for a tool like CodePen or Replit, this is mandatory.

Conclusion: Trust No One

In the Agentic Web, files are moving freely between humans and AIs. Ensuring that these payloads are safe to render is the foundation of a trustworthy platform.

Security is not a single feature; it's a series of layers. By combining Input Validation (Typescript), AST Sanitization (Rehype), and Browser Enforcement (CSP), we create a fortress that allows for creativity without compromising safety.

Read Next