A 3D-printed molecular model on a desk with a laptop showing the same molecule rendered as SVG — the visual marker for accessible STEM diagram production.
Image description: A 3D-printed molecular model on a desk with a laptop showing the same molecule rendered as SVG — the visual marker for accessible STEM diagram production.

Engineering primer · Accessible STEM diagrams

Accessible STEM diagrams: SVG, ARIA-described content, audio descriptions

Chemistry molecules, biology cell structures, physics force diagrams, math function graphs — the production playbook for STEM imagery that screen readers, refreshable braille, and audio-description streams can actually consume.

Accessible STEM diagrams:
SVG, ARIA-described content, audio descriptions

A chemistry molecule, a mitochondrion cross-section, a free-body force diagram, the graph of a cubic — every STEM textbook published in the last decade is built from images that a screen reader cannot meaningfully consume. The fix is not “add alt text.” It is a four-layer stack of accessible SVG, structured descriptions, audio descriptions for animated diagrams, and AT-compatibility knowledge that does not transfer between operating systems. This piece is the production playbook.

4
diagram types covered
3
description layers
2
AT stacks with known SVG gaps
15 min read
Updated May 2026

1. Why STEM diagrams are different from every other accessibility problem

A blog hero image with an alt attribute is a solved problem. A STEM diagram is not. Three properties of scientific imagery break the assumptions baked into alt, aria-label, and the screen-reader speech model.

First, the information density is high enough that a single sentence cannot carry it. A benzene ring is six carbons, six hydrogens, alternating double bonds, a delocalised pi system, a planar geometry, a 1.39 angstrom bond length. The alt-text convention asks for “a brief textual replacement”; benzene needs a paragraph. Compressing it into one sentence either loses the structural information (“a benzene molecule”) or produces an unreadable run-on that the screen reader has to spell out at 180 words per minute.

Second, the relationships between elements carry as much meaning as the elements themselves. In a free-body diagram, the arrow from the box to the wall is not decoration — it is the normal force, and its angle relative to the gravity vector is the answer to the problem. A flat description cannot encode “the angle between these two arrows is 90 degrees and that is why the problem resolves,” because flat description has no structure. SVG, used carefully, does.

Third, STEM students need to navigate the diagram, not just hear it. A learner working through a graph of a cubic function does not want to hear the alt text from start to finish — they want to land on the local maximum, ask “what is the slope here,” then move to the inflection point. That is a different interaction model than the one screen readers ship with by default. Building it requires keyboard handlers on individual SVG nodes, an ARIA-described content tree, and a fallback for the assistive-tech stacks that cannot keep up.

The four diagram types this piece covers

Chemistry molecules (atoms and bonds), biology cell structures (labelled regions), physics force diagrams (vectors with magnitudes and angles), and math function graphs (curves with named features). Each one stresses a different layer of the accessible-SVG stack, and the playbook at the end is shaped by what breaks for which.


2. SVG as the accessible substrate — and why raster is a dead end

Almost every published STEM textbook still ships its diagrams as PNG or JPG. A raster image is an opaque pixel grid: it has one entry point for assistive tech, the alt attribute, and one fallback, the longdesc attribute that browsers have spent ten years deprecating. There is no structure inside the image that a screen reader can address. The diagram is a black box with a label on the front.

SVG inverts the model. Every shape in an SVG document is a DOM node — addressable, focusable, labellable. A benzene ring rendered as SVG has six circle elements for the carbons, six line elements for the bonds, and an enclosing g element that names the whole. Each of those nodes can carry role, aria-label, aria-labelledby, aria-describedby, and tabindex attributes. The screen reader sees an accessibility tree of named regions instead of a single opaque blob.

The minimum viable accessible SVG carries three things on its root svg element: role=“img”, aria-labelledby pointing to a title child, and aria-describedby pointing to a desc child or to an external paragraph by ID. Each is small. Each does work the other two cannot.

Good-vs-bad SVG markup
Don’t
<img src="benzene.png"
     alt="Benzene molecule" />

The image is opaque. “Benzene molecule” gives a name and nothing else. A learner who needs the bond pattern, the ring geometry, or the carbon-hydrogen count cannot get it from this markup. There is no path to the structural information short of consulting a different source.

Do
<svg role="img"
     aria-labelledby="benz-title"
     aria-describedby="benz-desc"
     viewBox="0 0 200 200">
  <title id="benz-title">Benzene ring</title>
  <desc id="benz-desc">
    A regular hexagon of six carbon atoms,
    each bonded to one hydrogen. Alternating
    single and double bonds form a planar
    aromatic ring with delocalised electrons.
  </desc>
  <g role="group" aria-label="Carbon atoms">
    <circle cx="100" cy="40" r="6"
            tabindex="0"
            aria-label="C1, top"/>
    
  </g>
  <g role="group" aria-label="Bonds">
    
  </g>
</svg>

The root names itself and describes itself. Every atom is a tabbable, named region. A screen reader user can hear the summary, then tab into the structure to inspect a single bond. The same markup serves a sighted reader and a non-sighted reader without compromise.

One sharp warning: role=“img” on the root svg changes what assistive tech does with the children. With role=“img”, NVDA and JAWS treat the whole SVG as a single labelled image and do not expose the inner nodes — even if those inner nodes have tabindex. To get both behaviours — a summary label at the top and addressable children inside — leave the root role unset (or set role=“graphics-document” from the W3C Graphics ARIA module) and put the label on a child g rather than the root. Browsers and screen readers handle this combination unevenly. The matrix in section 6 documents what works where.


3. The longdesc-equivalent stack: where the long description actually lives

The longdesc attribute was the original answer to “an alt attribute is not enough.” Browsers have been quietly removing support for years; Firefox dropped it in version 90, Safari never implemented it, Chrome treats it as a no-op. Anyone still writing longdesc=“benzene-desc.html” in 2026 is shipping markup that nothing reads. The replacement is not a single attribute but a three-layer pattern that combines an inline description, a visible expandable panel, and machine-readable metadata.

Layer one is the inline desc element inside the SVG. Two to four sentences. Read by screen readers when the SVG root is announced. This is the new longdesc — a description that is part of the SVG document and travels with it wherever the SVG goes.

Layer two is a visible expandable description panel next to the diagram, available to every reader, not just screen-reader users. A summary line plus a disclosure button that opens a longer textual walkthrough — usually three to ten sentences for a chemistry molecule, longer for a cell-structure diagram with twenty labelled organelles. The visible panel solves a problem the inline desc cannot: students who can see the diagram but cannot decode it (low-vision learners, dyslexic learners, anyone learning the material for the first time) need the long description too. Putting it behind a button does not hide it from screen readers — they enumerate the disclosure, the user activates it, and the description is read into the buffer.

Layer three is structured metadata via JSON-LD. A CreativeWork object whose accessibilityFeature array enumerates what the diagram offers: longDescription, alternativeText, structuralNavigation, describedMath, tactileGraphic (if a printable tactile is available). Search engines, content recommenders, and accessibility-conformance scanners all consume this metadata. It does nothing for the immediate screen-reader reading experience, but it makes the diagram discoverable as accessible content — which matters when a learner is choosing between three textbooks and one of them advertises its accessibility features in machine-readable form.

JSON-LD WebSchema example

The CreativeWork object lives in a script type=“application/ld+json” block anywhere on the page. Keys: accessibilityFeature (array of strings — longDescription, alternativeText, structuralNavigation, MathML, describedMath), accessibilityHazard (noFlashingHazard, noMotionSimulationHazard), accessibilityAPI (ARIA), and accessMode (textual, visual) plus accessModeSufficient (the access modes that are enough on their own to perceive the work). A diagram that ships all three description layers should publish all of these.


4. Audio descriptions for animated diagrams: DOM mutation as a cue stream

Static diagrams are the easy case. The hard case is the animated diagram — a mitochondrion rotating in 3D, a sine wave being traced out across the x-axis, a chemical reaction with bonds breaking and reforming over four seconds. The conventional answer is a video file with an audio-description track, but that abandons the addressability of SVG: the moment you bake the animation into a video, every node you carefully labelled stops being a DOM node and becomes a pixel again.

The accessible alternative is to keep the animation as SVG (or Canvas with an offscreen accessibility tree) and emit audio descriptions as the animation progresses, driven by the same DOM mutations that drive the visual change. The pattern: a MutationObserver watches the SVG for changes — a new transform attribute, a bond appearing, a vector rotating — and at each significant change writes a short text update into a global aria-live=“polite” region. The visual animation drives an audio narration, generated on the fly from the same source of truth.

The implementation has three moving parts. The first is the animation itself, expressed as a sequence of timeline keyframes — the same data the SVG renderer consumes. The second is an annotation layer: each keyframe carries a short text describing what changes at that moment (“bond forms between C1 and C2,” “wave crosses zero from below”). The third is the audio-description driver, which subscribes to the timeline, picks up each annotated keyframe, and writes its text into the live region a few hundred milliseconds before the visual change lands. The lead time matches what production audio description does for film: the description is heard just before the visual event, not after.

Three failure modes are common enough to be worth flagging. First, burst updates. An animation that fires 60 mutations per second drowns the screen reader’s synthesizer — the announcements queue up, lag the animation, and become unintelligible. Annotate only the semantically significant keyframes, not every frame, and throttle to approx. one announcement per 1500ms. Second, missing the start. A live region that did not exist before the animation began will not announce its first update reliably (see the aria-live framework piece for the underlying scheduler issue). Mount the live region empty at page load. Third, no pause control. Users need to pause the audio description, slow it down, or step through it one event at a time. Build a small control bar — play, pause, previous-event, next-event — and wire its buttons to the same timeline driver.

prefers-reduced-motion is non-negotiable

Every animated STEM diagram must honour the prefers-reduced-motion: reduce media query. The replacement is not “no animation, no description” — it is a static SVG with the long description from layer two of the description stack expanded by default. Animation is one access mode; described static imagery is another. A vestibular-disorder learner who turned on reduced motion still needs the diagram, just not the rotation.


5. Keyboard navigation between data points in interactive charts

A math function graph that a sighted student can scrub with a cursor is not accessible until a non-sighted student can scrub it with the keyboard. The mechanism is well-known and badly implemented in the wild: each significant data point on the curve gets tabindex=“0”, an aria-label describing its coordinates and any named feature (“local maximum at x = -1, y = 4”), and a keyboard handler that responds to arrow keys for fine-grained movement between adjacent points.

The right granularity matters more than people realise. Tabbing through every plotted pixel of a cubic curve is hostile — the user hears thousands of “x equals 0.1, y equals 0.001” announcements before reaching anything interesting. Tabbing through only the named features (local maxima, minima, inflection points, x-intercepts, y-intercepts) is too sparse. The pragmatic compromise: two layers of navigation. The Tab key cycles through named features only — usually three to seven on a curve — and the arrow keys, once a feature is focused, step left and right along the curve at a learner-defined step size, announcing the coordinate at each step. Home and End jump to the curve’s left and right boundaries. Page Up and Page Down jump to the next named feature.

For a multi-series chart — a chemistry kinetics plot, a physics oscillation graph with two waveforms — add a series-switching axis. Up and Down arrow keys move between series at the current x-coordinate; Left and Right move along the current series. The convention parallels how spreadsheet readers navigate rows and columns and reuses a mental model many users already have.

One detail that gets missed: the focused data point needs a visible focus indicator. A non-sighted user does not need it, but a sighted screen-reader-using user does, as do partner instructors watching over the student’s shoulder. Render a focus ring around the focused SVG element with :focus-visible styling — the same convention as button focus rings, applied to SVG nodes that the browser does not style by default.

Diagram typeSVG markupLong descriptionAudio descriptionKeyboard nav
Chemistry moleculeRequired — role group per atom, per bondRequired — 3 to 6 sentencesOnly if animated reactionTab through atoms, arrow to bonds
Biology cell structureRequired — role group per labelled regionRequired — 5 to 12 sentencesOnly if animated processTab through organelles in z-order
Physics force diagramRequired — role group per vectorRequired — 3 to 5 sentences with magnitudes and anglesRequired if interactive (dragging vectors)Tab through vectors, arrow to rotate
Math function graphRequired — named features as nodesRequired — domain, range, asymptotes, featuresOptional — only if tracing animationTab for features, arrow for fine-grained scrub

6. AT compatibility: what works and where the SVG tree is broken

The hardest truth in this piece: the accessible-SVG stack does not work the same way across browsers and screen readers, and the gaps are not bugs in your markup. NVDA on Firefox is the most reliable combination — the only one where every pattern in this article behaves the way the W3C SVG accessibility mapping says it should. Every other combination has at least one known gap.

Safari on macOS with VoiceOver is the most problematic of the major stacks. WebKit’s SVG accessibility tree has documented holes in how it exposes inner g elements with ARIA labels: the labels are present in the DOM and inspectable with the accessibility inspector, but VoiceOver does not always pick them up when the user navigates with VO-Right-Arrow. The behaviour is inconsistent — sometimes the inner labels announce, sometimes only the root SVG label is read, with no client-visible pattern. The WebKit bugzilla has open issues going back to 2020 on this. The pragmatic implication: if your STEM diagram works on Mac, that is a necessary condition, not a sufficient one. Test on Windows with NVDA before shipping.

Chrome on Windows with JAWS is the second most reliable stack — close to NVDA + Firefox, with one wrinkle: JAWS treats SVG role=“img” more aggressively than NVDA, collapsing inner nodes more often. The fix is to use role=“graphics-document” from the W3C Graphics ARIA module on the root svg, which JAWS handles correctly. NVDA also handles it correctly. Firefox and Chrome both ship the necessary platform-API mappings.

Mobile is a separate problem. iOS VoiceOver inherits WebKit’s SVG gaps; Android TalkBack on Chrome handles inner nodes reliably but does not yet support W3C Graphics ARIA roles, so it falls back to role=“img” behaviour. For a textbook publisher targeting both desktop and mobile, the safe choice is to ship two SVG modes: a structurally-navigable mode for desktop and a “summary plus long description” mode that disables inner navigation on mobile. The mode switch is driven by user agent and by user preference, stored across sessions.

NVDA + FirefoxJAWS + ChromeVoiceOver + SafariTalkBack + Chrome
SVG title and desc (root)OKOKOKOK
Inner g with aria-labelOKOKPartialOK
tabindex on SVG nodesOKOKPartialFails
role=“graphics-document”OKOKFailsFails
aria-live driven by mutationsOKOKPartialPartial
focus-visible on SVG nodesOKOKOKOK

One reading of the matrix: ship NVDA + Firefox as the baseline conformance target, document the Safari and TalkBack fallbacks, and never use the absence of an inner-node announcement on a Mac as evidence that the SVG is inaccessible. The diagram may be perfectly accessible — the platform just is not exposing the labels you wrote. The accessibility inspector in Safari Develop Menu shows what is in the tree even when VoiceOver fails to read it, and is the right tool for distinguishing “broken markup” from “broken platform.”


7. The production playbook

1

Author every STEM diagram as SVG, never as raster

PNG and JPG are dead ends. SVG gives you a DOM, and the DOM is where every accessibility feature in this piece lives. If your authoring pipeline produces raster (most chemistry-drawing tools still do), add a step that exports SVG too, and ship both — the SVG is the accessible artifact, the PNG is a fallback for legacy printers.

2

Put title and desc on every SVG root

Two children. Title is the short name. Desc is two to four sentences describing what the diagram shows. Wire them up with aria-labelledby and aria-describedby on the root. No exceptions, even for “small” diagrams — a small molecule is still a molecule, and a screen-reader user has the same right to hear the structure that a sighted user has to see it.

3

Add a visible expandable long-description panel next to every diagram

Three to ten sentences, in a disclosure-pattern panel that any reader can open. Solves the description need for low-vision and dyslexic learners that the SVG desc alone does not serve. Mirror the description text into the SVG desc for screen-reader users who do not encounter the disclosure.

4

Publish a JSON-LD CreativeWork with accessibilityFeature

One block per page or per diagram. Enumerate every accessibility feature the diagram actually carries. Search engines and conformance scanners read this; learners using a CMS that filters for accessible content read this. It is cheap to write and pays back the moment someone is choosing between resources.

5

Drive audio description for animated diagrams from DOM mutations

One MutationObserver per animated SVG. Annotated keyframes in the animation timeline. A global empty aria-live=“polite” region at app start, mounted before any diagram renders. Throttle to approx. one announcement per 1500ms. Honour prefers-reduced-motion: reduce by collapsing to the static-plus-long-description fallback.

6

Make interactive charts keyboard-navigable at two granularities

Tab through named features only. Arrow keys for fine-grained movement along the curve. Home, End, Page Up, Page Down for boundary and feature jumps. Up and Down arrows switch series in multi-series plots. Render a visible focus ring on the focused SVG node — non-sighted users do not need it, sighted screen-reader users do.

7

Test on NVDA + Firefox before any other combination

The reference platform. If a pattern fails there, the markup is wrong. If it works there but fails on Safari, the platform is wrong and the next step is documenting the fallback rather than rewriting the SVG. JAWS + Chrome is the secondary acceptance test. VoiceOver + Safari is necessary for parity but never sufficient.


Conclusion: STEM accessibility is a markup problem with an interaction-design tail

Most published guidance on STEM diagram accessibility stops at the title-and-desc layer. That is the easy 30 percent. The remaining 70 percent — the long description panel, the audio-description timeline driven by DOM mutations, the two-granularity keyboard navigation, the platform-specific fallbacks — is interaction design as much as it is markup. The screen reader is one user; a non-sighted learner using a screen reader to navigate a function graph at the pace of a sighted classmate is a different user, with different needs.

The dividend is large and uneven. A textbook publisher who ships the full stack across approx. 600 diagrams in a calculus textbook serves every non-sighted learner using that textbook, every low-vision learner who appreciates the disclosure panel, every dyslexic learner who can read the long description but cannot decode the visual, every English-as-a-second-language learner who finds the structured description easier than the visual conventions of the field, and every sighted instructor producing audio summaries for podcasts. The same markup serves five distinct audiences. The cost is a few hours per diagram, amortised across decades of student use.

The current state of the art is uneven because the accessibility-tree implementations differ across the operating systems students actually use. NVDA and JAWS on Windows have closed most of the SVG gaps. Safari on macOS has not. Until the platforms converge, the production pattern is to author for the strictest target — NVDA + Firefox — and document the fallbacks for the platforms with known gaps. That is more work than the alt-attribute model used to require. It is also the only way to ship a STEM textbook that does not exclude the readers it is supposed to teach.

”A benzene ring is six carbons, six hydrogens, alternating double bonds, a delocalised pi system, a planar geometry, a 1.39 angstrom bond length. The alt-text convention asks for one sentence. SVG asks the right question instead — which atom would you like to land on first?”

— Disability World engineering desk, May 2026