Online Dev Tools

Developer & Security Tools for IT Professionals

Fuel The Infrastructure
Blog

Print-Ready PDFs from HTML: Paged.js, Headless Chromium, and the Truncated-Render Pitfall


Why HTML and CSS beat document editors for repeatable PDFs

If you produce PDFs more than once — reports, protocols, client deliverables, product documentation — a document editor is the wrong tool. Every revision means manually fixing layout drift, and there is no diff, no version control, and no way to regenerate fifty documents from a template.

HTML and CSS solve all of that. Your document is plain text, so it lives in git. Your design system is a stylesheet, so every document inherits it. And modern CSS paged media gives you the features people assume require InDesign: running headers and footers, page numbers, a table of contents with real page references, and full-bleed cover pages.

The catch is that browsers only implement part of the paged-media spec. Chromium's built-in print pipeline ignores @page margin boxes, so content: counter(page) in a header or footer simply does not render. That is the gap Paged.js fills.

The stack: Paged.js plus the browser you already have

Paged.js is a polyfill. It takes your flowing HTML, slices it into page-sized fragments at render time, and builds the page chrome — headers, footers, counters — as real DOM. The CSS you write is standard paged-media syntax:

@page {
  size: Letter;
  margin: 0.9in 0.8in 0.7in;
  @top-left { content: "Document Title"; }
  @bottom-right { content: "Page " counter(page) " of " counter(pages); }
}
@page cover { margin: 0; @top-left { content: none; } }
.cover { page: cover; }
table, figure { break-inside: avoid; }
h2 { break-after: avoid; }

A few details that save hours:

The pitfall: headless print fires before Paged.js finishes

Here is the failure mode that costs people an afternoon. You wire everything up, run the obvious command:

chrome --headless --print-to-pdf=out.pdf document.html

and get a PDF with one page, clipped mid-sentence, with a footer that reads "Page 1 of 0".

Nothing errored. The browser loaded the page, waited for network idle, and printed — but Paged.js repaginates the DOM asynchronously after load. The CLI print pipeline has no idea your document is still rebuilding itself, so it snapshots whatever exists at that moment. On a small test file it may even work, which makes the failure look intermittent when it is actually a race you lose as soon as the document gets long enough to matter.

Flags like --virtual-time-budget do not reliably fix this, and the official pagedjs-cli wrapper downloads its own Chromium build, which is slow, large, and blocked on plenty of locked-down machines.

The fix: drive the installed browser and wait for a completion signal

Paged.js exposes lifecycle hooks. Set a flag when pagination finishes:

<script>
  window.PagedConfig = {
    auto: true,
    after: function () { window.__pagedDone = true; }
  };
</script>
<script src="paged.polyfill.js"></script>

Then drive the browser you already have installed with puppeteer-core — it is a small package with no bundled Chromium, and it points at Chrome or Edge by executable path:

const puppeteer = require("puppeteer-core");

const browser = await puppeteer.launch({
  executablePath: "C:\\Program Files (x86)\\Microsoft\\Edge\\Application\\msedge.exe",
  headless: "new"
});
const page = await browser.newPage();
await page.goto(fileUrl, { waitUntil: "networkidle0" });
await page.waitForFunction("window.__pagedDone === true", { timeout: 90000 });
await page.pdf({
  path: "out.pdf",
  printBackground: true,
  preferCSSPageSize: true,
  margin: { top: 0, right: 0, bottom: 0, left: 0 }
});
await browser.close();

Two of those options matter more than they look. preferCSSPageSize hands page dimensions to your @page rule instead of the driver's defaults, and the zeroed margins prevent the driver from adding its own margin on top of the ones Paged.js already built into each page. With the wait in place, counter(pages) resolves correctly too — the "of 0" footer disappears because the counter now exists before the snapshot.

Verify the output like it is code

A PDF that renders is not a PDF that is correct. Two checks catch most regressions:

  1. Count pages programmatically after every build. A sudden drop from eighteen pages to one is the truncation race reappearing, and it is silent otherwise.
  2. Extract the text layer with a tool like pdftotext and compare it against the previous build. Pasting the two extractions into the Diff Checker shows exactly what changed between renders — useful when a CSS tweak was not supposed to change content at all.

For documents with strict layout requirements, add one manual pass for orphaned elements: a callout box or pull-quote with break-inside: avoid that lands alone on its own page usually means a section needs rebalancing, not more CSS.

The payoff

Once the pipeline exists, a full document rebuild is one command and takes seconds. Design changes are stylesheet edits that apply to every document at once. The source diffs cleanly in review. And nothing in the toolchain costs anything: a polyfill, a small npm package, and the browser already on the machine.

Sources

  1. Original editorial article published by Online Dev Tools

Related tools