Making html_of_jsx ~10x faster
Server-side rendering should be as fast as possible. I've been working on this problem for a while now (with server-reason-react and html_of_jsx) and I have plenty of optimizations to do.
Each time I dig into performance, I arrive at the same insight: the first step is to avoid doing work that can be done earlier (and do it only once).
This post explains how static analysis can eliminate runtime overhead, making html_of_jsx between 2x and 12x faster and some assumptions that I thought were clever but turned out to be crap.
#What is html_of_jsx
html_of_jsx is an OCaml/Reason library that lets you write JSX and render it to HTML strings for server-side rendering:
let page =
<html lang="en">
<head><title>(JSX.string "My Page")</title></head>
<body>
<h1>(JSX.string "Hello, World!")</h1>
</body>
</html>
let html = JSX.render pageExamples in this post use mlx, an OCaml syntax extension that enables JSX.
It's designed for building server-rendered applications, static sites or even HTML emails.
#The baseline: building trees at runtime
Since HTML is a tree, the naive JSX implementation builds one too.
In the original implementation, the ppx (preprocessor) transforms JSX syntax into function calls that build a tree structure at runtime:
(* You write *)
<div id="container">
<span>(JSX.string "Hello")</span>
</div>
(* ppx transforms to *)
JSX.node "div"
[ ("id", `String "container") ]
[ JSX.node "span" [] [ JSX.string "Hello" ] ]The JSX.node function creates a Node from the type:
type element =
| Null
| String of string
| Unsafe of string
| Node of {
tag: string;
attributes: attribute list;
children: element list
}
| List of element listAt runtime, the JSX.render function walks this tree recursively to generate the HTML.
This works. But consider what happens for a simple static element like <div id="container" />:
- Allocate a
Noderecord (4 words: 1 header + 3 fields) - Allocate the attributes list (~14 words: cons cells + tuples + polymorphic variants + strings)
- Pattern match in
renderto identify it's a Node - Pattern match on each attribute to render it
In OCaml, every heap-allocated value has a 1-word header and each field is another word. A single <div id="container" /> ends up allocating around ~18 words (~144 bytes on 64-bit) just to produce a 24-character string:
<div id="container"></div>Looking at a real page like ocaml.org, it has 590 HTML nodes, almost ~85KB of allocations for static content. There's a lot of room for improvement.
#Rethinking the model
Since the ppx transformation runs at build time, the obvious optimization is to pre-render purely static elements into string literals.
That's a good idea, but real components aren't purely static. They mix fixed HTML with dynamic data:
let card ~title ~content =
<div id="card">
(JSX.string title)
<p>(JSX.string content)</p>
</div>With the tree model, we're forced to allocate records and lists for the div and p nodes, just to put title and content in the right place. We need a different perspective.
Instead of seeing it as a tree, we can see it as a template:
<div id="card"> ├───── Static
┌─────────────────┐
│ {title} ├─ Dynamic
└─────────────────┘
<p> ├───── Static
┌─────────────────┐
│ {content} ├─ Dynamic
└─────────────────┘
</p> ├───── Static
</div> ├───── Static
#Inspiration
Separating the static structure from the dynamic values is a technique well-explored in other high-performance systems:
- Million.js: Optimizing compiler for React
- blaze-html: Haskell's blazingly fast HTML combinator library
- Play Framework Twirl Templates: Scala's compiled template engine
- Partial Evaluation and Automatic Program Generation
#The strategy
To implement this, I introduced a static analyzer step in the preprocessor that walks each JSX element during the build step and classifies it:
-
Analyze attributes: Are all attribute values literals (
id="container") or do some depend on runtime values (id={className})? -
Analyze children: Are children static text, static nested elements, or dynamic expressions?
-
Decide: If everything is static, compute the HTML string during compilation and merge static portions together. Otherwise, bailout and generate code that builds the string at runtime.
#Compile-time rendering
For fully static elements, the ppx runs the rendering at compile time and emits a string literal:
(* You write *)
<div id="container"><span>"Hello"</span></div>
(* ppx transforms to *)
JSX.unsafe "<div id=\"container\"><span>Hello</span></div>"
(* NOTE: `JSX.unsafe` creates a Unsafe variant
and avoids some HTML escaping logic *)The string "<div id=\"container\"><span>Hello</span></div>" is computed during compilation: escaping attributes, building the tag structure, etc. At runtime, it's just a constant string. Zero computation, zero allocation.
For elements with dynamic content, we can't pre-compute the full string, so we generate Buffer-based code that assembles the HTML at runtime:
(* Has dynamic child - can't fully pre-compute *)
let greet name = <div>(JSX.string name)</div>
(* ppx emits *)
let greet name =
let buf = Buffer.create 128 in
Buffer.add_string buf "<div>";
JSX.write buf (JSX.string name);
Buffer.add_string buf "</div>";
JSX.unsafe (Buffer.contents buf)Notice that even here, the static parts ("<div>" and "</div>") are pre-computed string literals. Only the dynamic name is processed at runtime.
#The results
I benchmarked using OCaml's Benchmark library, measuring throughput (renders per second) with multiple iterations to account for variance.
JSX.render <div class="container"></div>
Baseline (JSX.node): ~8M renders/sec
After: ~27M renders/sec → ~3x faster
For nested static elements, the improvement is more dramatic:
JSX.render <div><header><h1>Title</h1></header><main>...</main></div>
Baseline (JSX.node): ~2M renders/sec
After: ~27M renders/sec → ~12x faster
The deeper the nesting, the less construction and traversal.
#Small wins that added up
#Eliminating wrapper allocations
Dynamic strings like <div>(JSX.string name)</div> were still inefficient. JSX.string wraps the value in a String variant that eventually gets passed to JSX.write, which immediately unwraps it.
I updated the ppx to detect JSX.string at compile time and generate a direct call to a new JSX.escape function.
(* Before: allocate wrapper, pattern match, unwrap *)
JSX.write buf (JSX.string name);
(* After: direct call, no allocation *)
JSX.escape buf name;This simple change made rendering dynamic strings 34% faster (from ~15.5M to ~20.8M renders/sec).
#Happy path with zero-allocation escaping
Most user-generated content doesn't contain HTML special characters (<, >, &, ", ').
I implemented a "scan-first" strategy:
- Scan the string to find the first character that needs escaping
- If none found, return the original string untouched (zero allocations)
- If found, start escaping from that position onward, skipping the already-scanned prefix
The common case now allocates nothing. The less common case pays with a pass, but that's probably inevitable.
#Detours: ideas that didn't work out
Performance optimization is humbling. Most "obvious" improvements turn out to be slower, equivalent, or only faster in edge cases. There are some approaches I tried that seemed brilliant at the time:
-
Pre-computing exact buffer sizes: The overhead of calculating the final size exceeded the savings from avoiding reallocation.
-
Eliminating the
elementtype entirely: If everything becomesJSX.unsafe(string), why keep the variant type? BecauseJSX.writestill needs to handle unknown elements passed as children,JSX.nullneeds semantic representation, andJSX.listrequires deferred composition. -
Inlining
JSX.escapeat every call site: The compiler already inlines small functions, and the code bloat hurt instruction cache performance. -
Using
Bytesinstead ofBuffer: Manual byte manipulation for "more control". Buffer is already optimized for this exact use case. -
Avoiding the fast-path check in
JSX.escape: at first, the scan-then-escape approach seemed wasteful (two passes). But in the common case (strings that don't need escaping), returning the original pointer is a win.
#Conclusion
After these optimizations, the ppx classifies elements into four tiers:
| Tier | Pattern | Generated code |
|---|---|---|
| 1. Fully Static | All literals | JSX.unsafe("...") |
| 2. Static + String Holes | JSX.string(expr) children | Buffer + JSX.escape |
| 3. Static + Element Holes | Component/element children | Buffer + JSX.write |
| 4. Dynamic Structure | Dynamic attributes | JSX.node(...) |
The improvement scales with how static your content is. Mostly-static pages (landing/emails) see up to 10x gains. Typical mixed pages see 2-4x improvements.
In retrospect, the original implementation was correct, but correctness and efficiency are different goals. Performance often bends correctness (and sometimes maintainability too!).
Once I knew some of the work only needs to happen once, the optimizations became obvious, so again: the fastest code is code that doesn't run at all.
html_of_jsx is open source at github.com/davesnx/html_of_jsx. The code described here are available starting from version 0.0.7.
If you have ideas for more optimizations, I'd love to hear them -> open an issue!
Benchmarks were run on Apple Silicon (M1) with OCaml 5.4.0. Results will vary based on hardware, OCaml version, and content characteristics.
Thanks for reading! If something's unclear or you think I'm wrong, tell me. Feedback is appreciated.
@davesnx