ADA PDF Remediation Prompt — WCAG 2.1 AA / PDF/UA Compliance
For use with
Claude Opus 4.6 Pro Plan (or higher)
Google Gemini 2.5 Pro
ChatGPT 5.4 Pro
Any generative AI with code execution
Updated 4/21/2026 to handle headers within PDFs better.
Updated 4/13/2026 to handle formulae within PDFs better.
How to use
You will need a paid tier on a generative AI platform. The most cost-effective options are probably ChatGPT Plus ($20/month — includes Code Interpreter), Claude Pro ($20/month — includes computer use), or Gemini Advanced ($20/month). All three give the full vision + code execution + file handling pipeline.
Copy everything below (easiest with the button) and paste it as your prompt. Then upload your PDF within that prompt request. This has been tested on Claude Opus 4.6 Extended Thinking. It does not work with ZotGPT, as the latter cannot run code in its environment.
Include as a skill in Claude
Note, one can also add this as a skill within Claude, so that you can just say "Make this PDF ADA compliant" in a
prompt without the long prompt below.
To do that, upload the contents of
this markdown file to Claude's skills under its Customize options. Thank you to Tim Tait for this suggestion!
Prompt
You are an ADA/Section 508 accessibility remediation specialist. I am uploading a PDF that needs to be made fully accessible to comply with **WCAG 2.1 Level AA** and **PDF/UA (ISO 14289)** standards for a University of California campus accessibility review.
**Your task:** Process the uploaded PDF through a complete remediation pipeline and return a new, fully tagged, accessible PDF. Execute all steps by writing and running code — do not just describe what should be done.
---
### STAGE 0 — EXISTING STRUCTURE INSPECTION
Before building anything, inspect what the PDF already has. Many modern authoring tools
(Keynote, Beamer/LaTeX, PowerPoint, InDesign, Word) emit a partial structure tree with
BDC/EMC marked content. Determine:
1. **Does `/StructTreeRoot` already exist?** If yes, walk it and count element types
(`/Figure`, `/Formula`, `/H1`, `/H2`, `/H3`, `/P`, etc.) and check which have `/Alt`
entries.
2. **Do all page content streams already contain `BDC` and `EMC`?** If yes, do NOT
rewrite the content streams — only augment the structure tree metadata.
3. **Are there `/Figure` elements with no `/Alt`?** These need alt-text added.
4. **Are there zero `/Formula` elements but the PDF visually contains equations?**
This is the most common compliance gap in academic slides — equations rendered by
LaTeXiT or similar tools are tagged as `/Figure` by the authoring tool. These must
be promoted to `/Formula` in Stage 4.
5. **Heading inventory.** Count `/H1` / `/H2` / `/H3` … elements and for EACH one check:
(a) does it have a non-empty `/K` pointing to an MCID; (b) does it have `/Pg` bound
to a page; (c) does its MCID correspond to a real BDC…EMC region in that page's
content stream. Headings that fail any of (a)/(b)/(c) are "empty headings" — Panorama
and other campus checkers report them as "no headings found" even when the elements
technically exist in the tree.
6. **Multiple H1s.** If the existing tree contains more than one `/H1`, flag it. Per
WCAG 2.1 and PDF/UA, a document should have exactly one `/H1` (the document title).
Multiple H1s are a hierarchy violation most campus checkers reject.
Print a summary of the existing structure before proceeding. If the PDF already has a
well-formed tree with BDC/EMC, the remediation strategy is to **augment** (add `/Alt`,
fix heading hierarchy, promote equation figures to `/Formula`, add missing metadata)
rather than rebuild.
---
### STAGE 1 — VISUAL ANALYSIS
Rasterize every page of the PDF (200 DPI recommended) and visually inspect each page
image. Identify **every** visual element that requires alternative text OR that
contributes to the document's heading hierarchy, including:
- **Headings** — any rendered text that visually functions as a title, section header,
or subheading. Classify each with a `level` (1–6). The document title is level 1, slide
or section titles are level 2, subsection headings are level 3, and so on. See "HEADING
HIERARCHY RULES" below — campus accessibility checkers (Panorama in particular) fail
PDFs whose heading hierarchy is broken even when all figures have alt-text.
- **Equations** — inline or display math, formulas, chemical notation, any symbolic
expression rendered as an image or vector graphic.
- **Figures** — photographs, illustrations, diagrams, flowcharts, circuit diagrams,
maps, schematics.
- **Plots and charts** — line plots, bar charts, scatter plots, histograms, pie charts,
box plots.
- **Tables rendered as images** — tables that are not tagged as native PDF table
structures.
- **Logos, icons, and decorative images** — logos need alt-text; purely decorative
images should be marked as artifacts.
- **Rendered prose blocks** — paragraphs of body text rendered as graphics (common in
Keynote/LaTeXiT decks where every text block is a separate rendered element). Tag as
`Paragraph`.
For each element, record:
1. **Page number** (1-indexed)
2. **Element type** — one of: `Heading`, `Paragraph`, `Figure`, `Formula`, `Image`,
`Diagram`, `Table`
3. **Level** — integer 1–6, required for `Heading`, omitted for other types
4. **Text / alt-text**:
- For **headings**: the verbatim rendered text, preserving original capitalization
and punctuation. This string is what the pipeline uses to locate the matching MCID.
- For **paragraphs**: verbatim transcription of the block's text.
- For **equations**: spoken-English description AND a `LaTeX:` line with the LaTeX
source (e.g., "f of x equals four over pi times sine of pi x. LaTeX: f(x) =
\\frac{4}{\\pi}\\sin(\\pi x)")
- For **plots/charts**: axes with units, data trend, and key quantitative takeaway
- For **figures/diagrams**: what is depicted, spatial layout, labels, relevance
- For **images**: visual content and contextual purpose
- For non-heading, non-paragraph elements, aim for 1–4 sentences. A screen reader
user should understand the content without seeing the image.
5. **Approximate bounding box** — percentage coordinates `[x0%, y0%, x1%, y1%]` from
top-left.
Print a summary of all identified elements organized by page before proceeding.
#### HEADING HIERARCHY RULES
These rules are enforced in Stage 3 verification. Apply them during Stage 1 classification:
1. **Exactly one `level: 1` heading per document.** This is the document title —
usually on a dedicated title/cover slide or the most prominent heading on page 1. If
the first content page IS the title page, mark its title as `level: 1` only and do
NOT also create a `level: 2` entry for the same text.
2. **Every content page gets a `level: 2` heading** (except the page containing the
`level: 1`, and except pages that are genuinely title-less: transition slides,
image-only pages, Q&A slides). Do NOT invent a heading where none is rendered —
omitting a page's heading is better than an empty one.
3. **No level skips.** Each heading's `level` may be at most one greater than the most
recent heading's level in document order. H1 → H2 OK; H1 → H3 NOT OK. Ascending is
unrestricted (H3 → H2 OK).
4. **If the document has no visible title slide** (e.g., a problem set that opens with
"Problem 1"), promote the first rendered heading to `level: 1` and classify
subsequent page headings as `level: 2`. Do not invent an H1 that isn't in the
rendered output.
---
### STAGE 2 — PDF STRUCTURE REMEDIATION
Using `pikepdf` (preferred) or `PyMuPDF`, modify the PDF to add **all** of the following
structural elements required by PDF/UA and WCAG 2.1 AA:
#### 2.1 — Document-Level Requirements
| Requirement | PDF Key | Value |
|---|---|---|
| **Mark as tagged** | `/MarkInfo` in catalog | `<< /Marked true >>` |
| **Document language** | `/Lang` in catalog | `"en-US"` (or appropriate language) |
| **Document title** | `/Title` in document info dict AND `dc:title` in XMP metadata | A descriptive title derived from the PDF content |
| **Display title in viewer** | `/ViewerPreferences` in catalog | `<< /DisplayDocTitle true >>` |
**The document title is required in three places:** Set `/Title` in the info dictionary,
`dc:title` in XMP metadata (via `pdf.open_metadata()`), AND `/DisplayDocTitle true` in
`/ViewerPreferences`. Campus accessibility checkers test for all three independently.
#### 2.2 — Structure Tree and Heading Hierarchy
If the PDF already has a well-formed `/StructTreeRoot` with `/Document` → `/Sect` →
child elements, **augment** it by adding `/Alt` entries to existing `/Figure` elements
and fixing the heading hierarchy (see §2.2.1). If no tree exists, build one with this
hierarchy:
```
/StructTreeRoot
└─ /Document
├─ /H1 (document title — EXACTLY ONE per document)
├─ /Sect (one per page)
│ ├─ /H2 (page/slide title — every content page gets one)
│ ├─ /P (body text)
│ ├─ /H3 (subsection heading, only if visually present)
│ ├─ /P
│ ├─ /Figure (with /Alt)
│ ├─ /Formula (with /Alt)
│ └─ ...
└─ /Sect (next page)
```
Each `/Figure` and `/Formula` structure element **must** carry an `/Alt` string entry
containing the alt-text from Stage 1.
##### 2.2.1 — Heading Struct Elements
Every heading struct element MUST have all three of:
- `/S` set to `/H1`, `/H2`, `/H3`, etc.
- `/Pg` set to the page object whose content stream contains the heading's marked region
- `/K` set to a non-empty integer (the MCID) or an array containing at least one integer
MCID, where the MCID corresponds to an actual BDC…EMC region in that page's content
stream
Helper to build a heading element correctly:
```python
from pikepdf import Dictionary, Name
def make_heading(pdf, level, page_obj, mcid, parent):
"""Build an /Hn struct element bound to a page and an MCID on that page.
level: int, 1..6
page_obj: the pikepdf page object whose content stream contains the marked region
mcid: int, the MCID used in the page's BDC /MCID <<>> region wrapping the text
parent: the parent struct element (usually the page's /Sect, or /Document for H1)
"""
assert 1 <= level <= 6, "Heading level must be 1..6"
return pdf.make_indirect(Dictionary({
"/Type": Name("/StructElem"),
"/S": Name(f"/H{level}"),
"/P": parent,
"/Pg": page_obj, # REQUIRED — binds heading to its page
"/K": mcid, # REQUIRED — integer MCID, not empty
}))
```
**Fixing an existing tree with empty or mis-nested headings.** If Stage 0 found headings
that violate hierarchy rules, repair in this order:
1. **Empty headings** (no `/K` or `/Pg`): try to locate the MCID that contains the
heading text by parsing the page's content stream (§4.2 MCID parser) and matching the
heading text captured in Stage 1 against the decoded Tj/TJ string operands in each
MCID segment. Set `/K` to that MCID and `/Pg` to the page object.
2. **Multiple H1s**: keep the first one (or the one on the title page), demote the rest
to `/H2` by setting `/S = Name("/H2")`.
3. **Level skips**: walk the heading list in document order; whenever `next.level -
prev.level > 1`, demote `next` until the skip is ≤ 1.
##### 2.2.2 — Binding headings to marked content
*Case A — content stream is being written from scratch (§2.3 path).* Reserve MCID 0 for
the page heading text, MCID 1 for the body. Your `/H2` struct element's `/K` is `0` and
`/Pg` is the page object.
*Case B — content stream already has BDC/EMC (don't modify streams).* Use the §4.2 MCID
parser to build a `{mcid: bytes}` map for each page, then substring-match the heading
text (from Stage 1) against decoded text operands in each segment to find the right
MCID. Use that MCID in `/K`.
*Case C — cannot reliably identify the heading's MCID.* Set `/ActualText` on the heading
struct element to the Stage-1-captured heading string; point `/K` at any MCID on the
page and set `/Pg` to the page object. Assistive tech will read `/ActualText` in
preference to the underlying marked content.
#### 2.3 — Marked Content
**FIRST: check whether the original content streams already contain BDC/EMC.** If they
do, do NOT modify the content streams — the existing marked content is sufficient, and
rewriting streams risks corrupting the visual rendering.
```python
def has_bdc_emc(pdf):
for page in pdf.pages:
po = page.obj if hasattr(page, 'obj') else page
cs = po.get("/Contents")
if cs is None:
return False
if isinstance(cs, pikepdf.Array):
raw = b"".join(bytes(pdf.get_object(x.objgen).read_bytes()) for x in cs)
else:
cs_obj = pdf.get_object(cs.objgen) if cs.is_indirect else cs
raw = bytes(cs_obj.read_bytes()) # read_bytes() = decompressed
if b"BDC" not in raw or b"EMC" not in raw:
return False
return True
```
If streams do need BDC/EMC added, **always decompress first** using `read_bytes()` (not
`read_raw_bytes()`), then prepend/append the new BDC/EMC operators, and write back via
`pdf.make_stream()`. **Never concatenate `read_raw_bytes()` output (compressed binary)
with uncompressed text** — this produces an unparseable stream that renders blank pages.
The heading's drawing operators must live inside the `/H2` BDC…EMC block so the heading
is non-empty:
```python
# CORRECT — decompress, split heading operators from body operators, wrap each:
orig = bytes(cs_obj.read_bytes()) # decompressed
# If you can isolate just the slide-title operators (heading_ops_bytes) from the
# body operators (body_ops_bytes), split the stream. Otherwise use the /ActualText
# fallback from §2.2.2 Case C.
new_stream = (
b"/H2 <> BDC\n" + heading_ops_bytes + b"\nEMC\n"
b"/P <> BDC\n" + body_ops_bytes + b"\nEMC\n"
)
page_obj["/Contents"] = pdf.make_stream(new_stream)
# WRONG — empty heading region:
new_stream = b"/H2 <> BDC EMC\n/P <> BDC\n" + orig + b"\nEMC\n"
# The /H2 has no content between BDC and EMC. Panorama reports "no headings".
# ALSO WRONG — never concatenate compressed with text:
orig = bytes(cs_obj.read_raw_bytes()) # still compressed!
new_stream = b"/H2 <> BDC EMC\n" + orig # corrupt, renders blank
```
#### 2.4 — Parent Tree
Build a `/ParentTree` number tree in the `/StructTreeRoot` that maps each page's
`StructParents` index to an array of structure elements (one per MCID on that page).
Every page must have a `/StructParents` integer entry.
#### 2.5 — Remove Legacy Annotations
If the PDF has pre-existing annotations from prior (incomplete) remediation attempts,
remove them to avoid confusing accessibility checkers. Only structure-tree-based tagging
should remain.
---
### STAGE 3 — VERIFICATION (FIRST PASS)
After saving the remediated PDF, reopen it and programmatically verify **every**
requirement:
| # | Check | How to verify |
|---|---|---|
| 1 | Document is tagged | `/MarkInfo` → `/Marked` is `true` |
| 2 | Structure tree exists | `/StructTreeRoot` is present in catalog |
| 3 | Document language set | `/Lang` is present and non-empty |
| 4 | Document title in info dict | `/Title` in info dict is non-empty and descriptive |
| 5 | Document title in XMP | `dc:title` in XMP metadata is non-empty |
| 6 | Display title enabled | `/ViewerPreferences` → `/DisplayDocTitle` is `true` |
| 7 | Root element is /Document | `/StructTreeRoot` → `/K` → `/S` == `/Document` |
| 8 | **Exactly one /H1** | Count of `/H1` elements in tree == 1 |
| 9 | **Every content page has a heading** | Every page except explicitly exempted ones has at least one heading struct element with `/Pg` pointing to it |
| 10 | **First heading is /H1** | First heading in document order has level 1 |
| 11 | **No heading-level skips** | For every consecutive pair of headings in doc order, `next.level - prev.level ≤ 1` |
| 12 | **Every heading has non-empty /K** | Every `/Hn` struct element's `/K` is a non-null integer or a non-empty array |
| 13 | **Every heading has /Pg** | Every `/Hn` struct element has a `/Pg` entry pointing to a valid page |
| 14 | **Every heading's MCID is real** | For each heading, parse its `/Pg`'s content stream and confirm the `/K` MCID appears inside a BDC…EMC region with non-empty operators |
| 15 | All figures have /Alt | Every `/Figure` element has a non-empty `/Alt` string |
| 16 | All formulas have /Alt | Every `/Formula` element has a non-empty `/Alt` string |
| 17 | Pages have /StructParents | Every page has a `/StructParents` integer |
| 18 | ParentTree present | `/StructTreeRoot` → `/ParentTree` exists |
| 19 | Marked content in streams | Every page content stream contains `BDC` and `EMC` operators |
**Heading-hierarchy audit helper:**
```python
def walk_headings(node, out):
"""Depth-first traversal collecting heading struct elements in document order."""
s = node.get("/S")
if s is not None:
name = str(s)
if len(name) == 3 and name.startswith("/H") and name[2].isdigit():
out.append((int(name[2]), node))
k = node.get("/K")
if k is None:
return
children = list(k) if isinstance(k, pikepdf.Array) else [k]
for ch in children:
try:
if ch.is_indirect:
walk_headings(ch, out)
except AttributeError:
pass # plain int MCID, not a struct element
def audit_hierarchy(pdf):
root = pdf.Root["/StructTreeRoot"]["/K"]
doc_root = root if not isinstance(root, pikepdf.Array) else root[0]
headings = []
walk_headings(doc_root, headings)
failures = []
if not headings:
failures.append("no heading struct elements found")
return failures
if headings[0][0] != 1:
failures.append(f"first heading is /H{headings[0][0]}, must be /H1")
h1_count = sum(1 for lvl, _ in headings if lvl == 1)
if h1_count != 1:
failures.append(f"found {h1_count} /H1 elements, must be exactly 1")
for (a, _), (b, _nb) in zip(headings, headings[1:]):
if b - a > 1:
failures.append(f"heading level skip: /H{a} → /H{b}")
for lvl, node in headings:
if node.get("/K") is None:
failures.append(f"/H{lvl} has no /K")
if node.get("/Pg") is None:
failures.append(f"/H{lvl} missing /Pg binding")
return failures
```
**CRITICAL — check #16 zero-formula trap:** If check #16 passes vacuously (zero
`/Formula` elements found) BUT the PDF visually contains equations, this is a
**compliance gap**, not a pass. Flag it and proceed to Stage 4.
Print a pass/fail checklist for each item. For checks 8–14, if any fail, print the
offending heading element's location (page index, MCID, `/S` value) so the remediation
can be re-run or augmented. Walk the structure tree and print every `/Hn`, `/Figure`,
and `/Formula` element with a preview of its alt-text or heading text.
---
### STAGE 4 — FORMULA PROMOTION (when equations are tagged as /Figure)
**Why this stage exists.** Slide-authoring tools (Keynote, PowerPoint, Google Slides) tag
all rendered content — including equations — as `/Figure`. Screen readers can read
`/Figure` alt-text, but math-aware assistive technology treats `/Formula` elements
specially. Promoting equation figures to `/Formula` with LaTeX-embedded alt-text is the
highest-fidelity outcome for STEM content.
**This stage is required when** Stage 3 reports zero `/Formula` elements but the PDF
visually contains equations.
#### 4.1 — Detect Embedded LaTeX Source (latexit)
Many macOS slide authors use **LaTeXiT** to render equations in Keynote. LaTeXiT embeds
the original LaTeX source as a base64-encoded, zlib-compressed Apple binary property list
(bplist) inside a `` annotation in the PDF content stream.
To extract it:
```python
import re, base64, zlib, plistlib
def extract_latexit_latex(segment_bytes):
"""Extract LaTeX source from a latexit bplist annotation in a content stream segment."""
m = re.search(rb']*>([A-Za-z0-9+/=\s]+)', segment_bytes)
if not m:
return None
try:
b64 = re.sub(rb'\s+', b'', m.group(1))
raw = base64.b64decode(b64)
# latexit format: 4-byte header, then zlib-compressed bplist
plist_data = zlib.decompress(raw[4:])
plist = plistlib.loads(plist_data)
return plist.get('source') # original LaTeX source string
except Exception:
return None
```
If no latexit annotations are found, fall back to visual inspection and manual LaTeX
transcription from the rasterized page images.
#### 4.2 — Parse MCID Segments from Content Streams
Associate each `/Figure` (and `/Heading`, for §2.2.1 Case B) struct element with its
content stream bytes by parsing the BDC/EMC blocks for MCID numbers:
```python
def parse_mcid_segments(raw_bytes):
"""Return dict of MCID (int) → content stream bytes for each BDC/EMC block."""
segments = {}
i = 0
bdc_stack = []
while i < len(raw_bytes):
m = re.search(rb'\b(BDC|BMC|EMC)\b', raw_bytes[i:])
if not m:
break
abs_pos = i + m.start()
op = m.group(1)
if op in (b'BDC', b'BMC'):
preceding = raw_bytes[max(0, abs_pos - 300):abs_pos]
mm = re.search(rb'/MCID\s+(\d+)', preceding)
mcid = int(mm.group(1)) if mm else None
bdc_stack.append((mcid, abs_pos + len(op)))
elif op == b'EMC':
if bdc_stack:
mcid, start = bdc_stack.pop()
if mcid is not None and mcid not in segments:
segments[mcid] = raw_bytes[start:abs_pos]
i = abs_pos + len(op)
return segments
```
**Important:** The `/K` field of a struct element may contain plain integers (MCID
numbers), not just indirect references. Always guard against `AttributeError` when
calling `.is_indirect` on plain integers by wrapping in try/except.
#### 4.3 — Classify Equation vs. Non-Equation Figures
For each `/Figure` struct element, get its MCID and look up the content stream segment.
The element is an **equation** if:
- The segment contains a `` annotation, AND
- The extracted LaTeX source (after stripping `\color` commands) begins with a
mathematical token (`\frac`, `\sqrt`, `\hbar`, `\psi`, `\int`, `\begin{equation}`,
`\begin{align}`, a bare symbol like `k =`, etc.)
The element is **not** an equation if:
- The segment contains a `Do` operator referencing an external image XObject (photo,
plot screenshot, raster image)
- The LaTeX source begins with `\begin{itemize}`, `\begin{enumerate}`, or a capitalized
English word (prose, not math)
Detect raster images via `bool(re.search(rb'\bDo\b', segment))`.
#### 4.4 — Clean LaTeX for Alt-Text
The latexit `source` field contains `\color[rgb]{R,G,B}` wrappers and a preamble color
line. Strip them before including LaTeX in alt-text so AT tools don't read out the color
commands:
```python
def clean_latex(s):
s = re.sub(r'^\s*\\color\[[^\]]+\]\{[^}]+\}\s*\n?', '', s.strip())
for _ in range(8):
s = re.sub(r'\\color\[[^\]]+\]\{[^}]*\}', '', s)
s = re.sub(r'\\color\{[^}]*\}', '', s)
s = re.sub(r'^\s*%?\\noindent\s*\n?', '', s, flags=re.MULTILINE)
s = re.sub(r'^\s*%[^\n]*\n', '', s, flags=re.MULTILINE)
s = re.sub(r'\{\s*\}', '', s)
s = re.sub(r'\s{3,}', ' ', s)
return s.strip()
```
#### 4.5 — Promote /Figure → /Formula
For each `/Figure` element that passes the equation classifier:
```python
obj['/S'] = Name('/Formula')
obj['/Alt'] = String(f"{spoken_description}. LaTeX: {clean_latex(raw_source)}")
```
The alt-text MUST contain both:
- A **spoken-English description** spelling out all symbols (e.g., "negative h-bar
squared over two m times the second partial of psi with respect to x")
- The **clean LaTeX source** prefixed with `LaTeX:` so math-aware AT can render it
Do not promote `/Figure` elements that are raster images, wave-function plots,
screenshots, QR codes, or decorative backgrounds.
#### 4.6 — Re-run Stage 3 Verification
After promotion, re-run the Stage 3 checklist (all 19 items). Check 16 should now show
a non-zero `/Formula` count. Append the second-pass report to the JSON output.
---
### OUTPUT
- Save the remediated PDF as `_ada.pdf`
- Save a JSON verification report as `_ada.report.json`
- Print a summary: total pages, heading-hierarchy audit result, total figures, total
formulas, and overall pass/fail
---
### COMMON PITFALLS
- **"No headings" reported by Panorama even though `/H1` is in the tree.** Three causes:
(1) heading `/K` points to an MCID with no matching BDC…EMC region in any page
content stream (empty heading); (2) heading has no `/Pg` entry, so the checker can't
locate its marked content; (3) the document has multiple `/H1` elements, which some
checkers reject as invalid hierarchy and report as "no valid headings." The Stage 3
audit catches all three.
- **One `/H1` per page is wrong.** A document has exactly one H1 (the title). Each
content page gets an H2.
- **Heading-level skips** (H1 → H3 with no H2 between them) fail WCAG 2.4.6. If a visual
subheading looks like H3 but has no H2 above it, promote it to H2.
- **Empty heading BDC block.** `/H2 <> BDC EMC` with no drawing operators
between BDC and EMC produces a heading struct element that checkers treat as empty.
The drawing operators for the heading text must live between BDC and EMC, or use the
`/ActualText` fallback from §2.2.2 Case C.
- **Never mix `read_raw_bytes()` with uncompressed BDC/EMC text** — corrupts the stream
and renders the page blank. Always use `read_bytes()` when reading streams you plan
to modify.
- **Don't skip equations.** Zero `/Formula` elements in a document that visually
contains equations is a compliance gap — run Stage 4.
- **Don't forget the title in three places:** `/Title` in info dict, `dc:title` in XMP,
and `/DisplayDocTitle true` in ViewerPreferences.
- **The info-dict title must be descriptive** — without it, PDF viewers show the
filename instead of the title in the title bar, which fails WCAG 2.4.2.
- **Guard against `int.is_indirect`** when traversing `/K` arrays. The field may contain
plain integer MCIDs, not just indirect references.
- **Overwriting the input file with pikepdf** requires
`pikepdf.open(path, allow_overwriting_input=True)`.