Standards · WCAG 2.2

SC 1.2.2 Level A WCAG 2.0

Captions (Prerecorded)

Every prerecorded video with audio needs synchronized captions covering dialogue, speaker identification, and meaningful non-speech sounds — so deaf and hard-of-hearing users get the same information from the soundtrack as everyone else.

What it asks

Captions are a synchronized text version of all audio information in a video — dialogue, speaker identification, and non-speech sounds that affect meaning (laughter, applause, a phone ringing, music that sets a mood). Subtitles that only translate dialogue are not enough; captions for accessibility include the sonic context. The only exception is media that is itself a media alternative for text (and is clearly labelled as such).

How to meet it

  • Upload a .vtt or .srt caption file with every prerecorded video via the <track kind="captions"> element or the platform’s captions field.
  • Edit auto-generated captions before publishing — YouTube auto-captions miss punctuation, speaker boundaries, and non-speech audio.
  • Identify speakers when more than one person is talking, especially when they are off-screen.
  • Caption meaningful non-speech sounds: [laughter], [door slams], [suspenseful music]. Skip ambient sounds with no narrative weight.
  • Keep captions in sync within roughly one second of the audio; long lag breaks comprehension.
  • For embedded videos (Vimeo, Wistia, Mux), make sure the captions track is enabled by default for keyboard users.

Common failures

  • “Auto-generated captions are good enough” — they are not. Auto-captions routinely mistranscribe names, technical terms, and overlap.
  • Burned-in open captions that cannot be turned off, then cause layout issues on mobile (and prevent translation).
  • Music videos with lyrics captioned but no [music] indicator during instrumental breaks.
  • Multi-speaker interviews with no speaker labels — the user hears “yes” but cannot tell who said it.
  • A captions toggle that defaults to off and is hidden behind a tiny “CC” button most users miss.

Why it matters

Around 15% of adults report some hearing loss; captions are also used heavily by non-native speakers, people watching in noisy environments, and anyone scrubbing through a long video. Failing 1.2.2 is the single most-cited issue on corporate video libraries — it shows up on almost every audit.