Server-side rendering should be as fast as possible. I've been working on this problem for a while now (with server-reason-react and html_of_jsx) and I have plenty of optimizations to do.
Each time I dig into performance, I arrive at the same insight: the first step is to avoid doing work that can be done earlier (and do it only once).
This post explains how static analysis can eliminate runtime overhead, making html_of_jsx between 2x and 12x faster and some assumptions that I thought were clever but turned out to be crap.
#What is html_of_jsx
html_of_jsx is an OCaml/Reason library that lets you write JSX, create your components and finally render as HTML, mostly for server-rendered applications, static sites or even HTML emails.
let page =
<html lang="en">
<head><title>(JSX.string "My Page")</title></head>
<body>
<h1>(JSX.string "Hello, World!")</h1>
</body>
</html>
let html = JSX.render pageExamples in this post use mlx, an OCaml syntax extension that enables JSX.
#The baseline: building trees at runtime
Since HTML is a tree, the naive JSX implementation builds one too.
In the original implementation, the ppx (preprocessor) transforms JSX syntax into function calls that build a tree structure at runtime:
(* You write *)
<div id="container">
<span>(JSX.string "Hello")</span>
</div>
(* ppx transforms to *)
JSX.node "div"
[ ("id", `String "container") ]
[ JSX.node "span" [] [ JSX.string "Hello" ] ]The JSX.node function creates a Node from the type:
type element =
| Null
| String of string
| Unsafe of string
| Node of {
tag: string;
attributes: attribute list;
children: element list
}
| List of element listAt runtime, the JSX.render function walks this tree recursively to generate the HTML.
This works. But consider what happens for a simple static element like <div id="container" />:
- Allocate a
Noderecord (4 words: 1 header + 3 fields) - Allocate the attributes list (~14 words: cons cells + tuples + polymorphic variants + strings)
- Pattern match in
renderto identify it's a Node - Pattern match on each attribute to render it
In OCaml, every heap-allocated value has a 1-word header and each field is another word. A single <div id="container" /> ends up allocating around ~18 words (~144 bytes on 64-bit) just to produce a 24-character string:
<div id="container"></div>Looking at a real page like ocaml.org, it has 590 HTML nodes, almost ~85KB of allocations for static content. There's a lot of room for improvement.
#Rethinking the model
Since the ppx transformation runs at build time, the obvious optimization is to pre-render purely static elements into string literals.
That's a good idea, but real components aren't purely static. They mix fixed HTML with dynamic data:
let card ~title ~content =
<div id="card">
(JSX.string title)
<p>(JSX.string content)</p>
</div>With the tree model, we're forced to allocate records and lists for the div and p nodes, just to put title and content in the right place. We need a different perspective.
Instead of seeing it as a tree, we can see it as a template:
<div id="card"> ├───── Static
┌─────────────────┐
│ {title} ├─ Dynamic
└─────────────────┘
<p> ├───── Static
┌─────────────────┐
│ {content} ├─ Dynamic
└─────────────────┘
</p> ├───── Static
</div> ├───── Static
#Inspiration
Separating the static structure from the dynamic values is a technique well-explored in other high-performance systems:
- Million.js: Optimizing compiler for React
- blaze-html: Haskell's blazingly fast HTML combinator library
- Play Framework Twirl Templates: Scala's compiled template engine
- Partial Evaluation and Automatic Program Generation
#The strategy
To implement this, I introduced a static analyzer step in the preprocessor that walks each JSX element during the build step and classifies it:
-
Analyze attributes: Are all attribute values literals (
id="container") or do some depend on runtime values (id={className})? -
Analyze children: Are children static text, static nested elements, or dynamic expressions?
-
Decide: If everything is static, compute the HTML string during compilation and merge static portions together. Otherwise, bailout and generate code that builds the string at runtime.
#Compile-time rendering
For fully static elements, the ppx runs the rendering at compile time and emits a string literal:
(* You write *)
<div id="container"><span>"Hello"</span></div>
(* ppx transforms to *)
JSX.unsafe "<div id=\"container\"><span>Hello</span></div>"
(* NOTE: `JSX.unsafe` creates a Unsafe variant
and avoids some HTML escaping logic *)The string "<div id=\"container\"><span>Hello</span></div>" is computed during compilation: escaping attributes, building the tag structure, etc. At runtime, it's just a constant string. Zero computation, zero allocation.
For elements with dynamic content, we can't pre-compute the full string, so we generate Buffer-based code that assembles the HTML at runtime:
(* Has dynamic child - can't fully pre-compute *)
let greet name = <div>(JSX.string name)</div>
(* ppx emits *)
let greet name =
let buf = Buffer.create 128 in
Buffer.add_string buf "<div>";
JSX.write buf (JSX.string name);
Buffer.add_string buf "</div>";
JSX.unsafe (Buffer.contents buf)Notice that even here, the static parts ("<div>" and "</div>") are pre-computed string literals. Only the dynamic name is processed at runtime.
#The results
I benchmarked using OCaml's Benchmark library, measuring throughput (renders per second) with multiple iterations to account for variance.
JSX.render <div class="container"></div>
Baseline (JSX.node): ~8M renders/sec
After: ~27M renders/sec → ~3x faster
For nested static elements, the improvement is more dramatic:
JSX.render <div><header><h1>Title</h1></header><main>...</main></div>
Baseline (JSX.node): ~2M renders/sec
After: ~27M renders/sec → ~12x faster
The deeper the nesting, the less construction and traversal.
#Small wins that added up
#Eliminating wrapper allocations
Dynamic strings like <div>(JSX.string name)</div> were still inefficient. JSX.string wraps the value in a String variant that eventually gets passed to JSX.write, which immediately unwraps it.
I updated the ppx to detect JSX.string at compile time and generate a direct call to a new JSX.escape function.
(* Before: allocate wrapper, pattern match, unwrap *)
JSX.write buf (JSX.string name);
(* After: direct call, no allocation *)
JSX.escape buf name;This simple change made rendering dynamic strings 34% faster (from ~15.5M to ~20.8M renders/sec).
#Happy path with zero-allocation escaping
Most user-generated content doesn't contain HTML special characters (<, >, &, ", ').
I implemented a "scan-first" strategy:
- Scan the string to find the first character that needs escaping
- If none found, return the original string untouched (zero allocations)
- If found, start escaping from that position onward, skipping the already-scanned prefix
The common case now allocates nothing. The less common case pays with a pass, but that's probably inevitable.
#Detours: ideas that didn't work out
Performance optimization is humbling. Most "obvious" improvements turn out to be slower, equivalent, or only faster in edge cases. There are some approaches I tried that seemed brilliant at the time:
-
Pre-computing exact buffer sizes: The overhead of calculating the final size exceeded the savings from avoiding reallocation.
-
Eliminating the
elementtype entirely: If everything becomesJSX.unsafe(string), why keep the variant type? BecauseJSX.writestill needs to handle unknown elements passed as children,JSX.nullneeds semantic representation, andJSX.listrequires deferred composition. -
Inlining
JSX.escapeat every call site: The compiler already inlines small functions, and the code bloat hurt instruction cache performance. -
Using
Bytesinstead ofBuffer: Manual byte manipulation for "more control". Buffer is already optimized for this exact use case. -
Avoiding the fast-path check in
JSX.escape: at first, the scan-then-escape approach seemed wasteful (two passes). But in the common case (strings that don't need escaping), returning the original pointer is a win.
#Conclusion
After these optimizations, the ppx classifies elements into four tiers:
| Tier | Pattern | Generated code |
|---|---|---|
| 1. Fully Static | All literals | JSX.unsafe("...") |
| 2. Static + String Holes | JSX.string(expr) children | Buffer + JSX.escape |
| 3. Static + Element Holes | Component/element children | Buffer + JSX.write |
| 4. Dynamic Structure | Dynamic attributes | JSX.node(...) |
The improvement scales with how static your content is. Mostly-static pages (landing/emails) see up to 10x gains. Typical mixed pages see 2-4x improvements.
In retrospect, the original implementation was correct, but correctness and efficiency are different goals. Performance often bends correctness (and sometimes maintainability too!).
Once I knew some of the work only needs to happen once, the optimizations became obvious, so again: the fastest code is code that doesn't run at all.
html_of_jsx is open source at github.com/davesnx/html_of_jsx. The code described here are available starting from version 0.0.7.
If you have ideas for more optimizations, I'd love to hear them -> open an issue!
Benchmarks were run on Apple Silicon (M1) with OCaml 5.4.0. Results will vary based on hardware, OCaml version, and content characteristics.