Captions (Prerecorded)
Every prerecorded video with audio needs synchronized captions covering dialogue, speaker identification, and meaningful non-speech sounds — so deaf and hard-of-hearing users get the same information from the soundtrack as everyone else.
What it asks
Captions are a synchronized text version of all audio information in a video — dialogue, speaker identification, and non-speech sounds that affect meaning (laughter, applause, a phone ringing, music that sets a mood). Subtitles that only translate dialogue are not enough; captions for accessibility include the sonic context. The only exception is media that is itself a media alternative for text (and is clearly labelled as such).
How to meet it
- Upload a
.vttor.srtcaption file with every prerecorded video via the<track kind="captions">element or the platform’s captions field. - Edit auto-generated captions before publishing — YouTube auto-captions miss punctuation, speaker boundaries, and non-speech audio.
- Identify speakers when more than one person is talking, especially when they are off-screen.
- Caption meaningful non-speech sounds:
[laughter],[door slams],[suspenseful music]. Skip ambient sounds with no narrative weight. - Keep captions in sync within roughly one second of the audio; long lag breaks comprehension.
- For embedded videos (Vimeo, Wistia, Mux), make sure the captions track is enabled by default for keyboard users.
Common failures
- “Auto-generated captions are good enough” — they are not. Auto-captions routinely mistranscribe names, technical terms, and overlap.
- Burned-in open captions that cannot be turned off, then cause layout issues on mobile (and prevent translation).
- Music videos with lyrics captioned but no
[music]indicator during instrumental breaks. - Multi-speaker interviews with no speaker labels — the user hears “yes” but cannot tell who said it.
- A captions toggle that defaults to off and is hidden behind a tiny “CC” button most users miss.
Why it matters
Around 15% of adults report some hearing loss; captions are also used heavily by non-native speakers, people watching in noisy environments, and anyone scrubbing through a long video. Failing 1.2.2 is the single most-cited issue on corporate video libraries — it shows up on almost every audit.