Audio Description Guidelines

Audio description is sometimes known as described video, video description, or visual interpretation.

Audio description allows any user with a visual disability or cognitive disability to receive a very detailed and descriptive experience of what is happening on the screen for every aspect of the video. A number of organizations require Audio description (AD) and Described video (DV), or video description in video, such as the Canadian Radio-television and Telecommunications Commission (CRTC).

Audio description is also used by sighted individuals, so both audio and video elements are needed (for example, someone with a little bit of vision loss who could not read the text but wants to see the rest).

Audio description and described video make TV programs accessible for people who are blind or who have visual impairments:

Audio description (AD):: Relies on a program host or announcer to provide a voice-over by reading aloud or describing key elements of programming, such as text and graphics that appear on the screen. It is often used for information-based programming, including newscasts, weather reports, sports scores, and financial data. Most broadcasters are required to provide audio description.
Described video (DV), or video description:: A narrated description of a program's main visual elements, such as settings, costumes, and body language. The description is added during pauses in dialogue and enables people to form a mental picture of what is happening in the program. Described video typically uses a separate audio track.

Source: TV access for people who are blind or partially sighted : Described video and audio description | CRTC

Note: If audio description is being used for a video, then the descriptions need to be included in the transcript.

Definitions

Describer:: The person who writes or generates the descriptions, whether in advance or live on the spot.
Narrator:: The person who speaks the descriptions aloud. Can be the same as the describer and can theoretically be a machine using speech output.
Production:: The single, discrete artwork being described, such as a play, a television program, a dance performance, a film, a photograph. Describing a complete television series, by contrast, involves a sequence of productions.

Standard

Describe what you observe:

Explanation:

It's the most basic requirement of audio description, but one that is routinely ignored.

Change history:

"Describe what you see" is something of a buzzword among describers, but "Describe what you observe" may be slightly better, prompting the describer to actually think about what is seen rather than jotting down a bare-bones and rote description.
Describers and narrators serve the audience and the production, not themselves:

Explanation:

You're not providing descriptions to show off your vocabulary or to highlight your beautiful voice. You work for the production and the audience. A certain self-effacement is required.
If time limits force you to be selective, first describe what is essential to know, such as actions and details that would confuse or mislead the audience if omitted.
Whenever possible, describe actions and details that add to the understanding of personal appearance, setting, atmosphere, and mise-en-scène (scenery).
Descriptions are usually delivered during pauses or quiet moments. It is permissible to let pauses or quiet moments pass without a description. Conversely, since it is more important to make a production understandable than to preserve every detail of the original soundtrack, it is permissible to describe over dialogue and other audio when necessary.
Describe as consistently as possible, using the same character names and terminology throughout a production or across several related productions, unless exceptions are warranted.
Describe any obvious emotional states. Do not attempt to describe what is invisible, as a mental state, reasoning, or motivation.
Deliver descriptions in a vocal style that melds into the surrounding audio at the point of the description:

Explanation:

We may need to add this qualifier to the principle: "Descriptions must not sound self-contained, prepackaged, or delivered according to a predetermined pattern." This principle seeks to solve the problem of description snippets recorded in isolation that all sound the same and do not match the actual production.
Narrators' voices must be distinguishable from other voices in a production.
Read titles and credits wherever possible, including subtitles in a foreign-language production.
Do not censor. Violence, sexuality, salty language, political imagery, or anything else a describer or narrator may personally dislike must nonetheless be described where applicable:

Explanation:

Describers and narrators do not get to pick and choose what to describe purely to satisfy their personal biases. ("Salty language" here refers to visible vulgar language, like a bumper sticker or T-shirt. Narrators may be required to utter words they would not ordinarily use.)
Do not specify an exact passage of time unless indisputable visual evidence supports it:

Explanation:

Say "nighttime," not "that night," unless you can prove from visible evidence that it is that night. To do otherwise essentially lies to the audience.
Extended descriptions – giving, for example, background on the production or definitions of terms – can be provided where possible but must limit themselves to the production actually at hand.
Describe in the language of the audience, not the production:

Explanation:

A program with segments in French and English should be described in English on an English-language television station. A Spanish-language production with Dutch subtitles should be described in Dutch on a Dutch TV station even though the surrounding audio isn't in Dutch. Truly bilingual programs on truly bilingual stations are rare, and in those cases the describer would still comply with this principle by describing in either of those languages (or by switching from one to another).

Audio Description Checklist

Does the audio description:

Describe what the viewer needs to know?
- Prioritize important content?
- Describe actions and details that add to the understanding of personal appearance, setting, atmosphere, and mise-en-scène (scenery).
Describe only what is seen?
- Exclude motivations or intentions?
- Narrate the main visual elements observed (settings, costumes, and body language).
- Obvious emotional states (exclude invisible perceptions such as mental state, reasoning, or motivation).
Describe where unidentified sounds are coming from? (phone ringing)
Include credits, subtitles, and captions?
Have a clear voice distinction between the description and the video's audio?
Not interfere with the important elements in the video's original audio?
Include tone and manner of voice, as needed (whispering).
Titles and credits wherever possible.

Is the audio description:

Accurate?
Good word choice?
Good pronunciation?
Good diction?
Good enunciation?
Consistent use of names and terms.
Consistent?
- Content and voicing match the style, tone, and pace of video.
- Synchronized and occurring at approximately the same time as it is appearing in the video.
Prioritized?
- Content is essential for comprehension?
Appropriate for intended audience?
- Described in the language of the audience
  - Is it objective?
  - Is it simple?
  - Is it succinct?
Equal?
- The meaning and intention of the material is preserved and conveyed?
- Uncensored?

Keyboard Access Guidelines

Accessible media players provide a user interface that works without a mouse, through speech interface, when the page is zoomed larger, and with screen readers. For example, media players need to:

Provide keyboard support (in Understanding WCAG: Keyboard Accessible)
Make the keyboard focus indicator visible (in Understanding WCAG: Focus Visible)
Provide clear labels (in Understanding WCAG: Labels or Instructions, Info and Relationships)
Have sufficient contrast between colors for text, controls, and backgrounds (in Understanding WCAG: Contrast (Minimum), Contrast (Enhanced), Non-text Contrast)

Some media players provide additional accessibility functionality to users such as:

Changing the speed of the video.
Setting how captions are displayed (for example text style, text size, colors, and position of the captions).
Reading the captions with a screen reader and braille device.
Interactive transcripts.

Keyboard Access Checklist

Can the user use a keyboard to operate the media player (WCAG 2.1.1)?
Is the media player free from keyboard traps? (WCAG 2.1.2)
Is the time-based media free from content that flashes more than three times per second? (WCAG 2.3.1)

Using a keyboard, does the media player:

Have a mechanism to pause or stop video? (WCAG 2.2.2)
Start at the user's request (does not automatically start playing)?
If it does start automatically, is there a mechanism to pause or stop the player within 3 focus points? (WCAG 1.4.2)
Have a mechanism to adjust the volume?
Have accurate button labels (WCAG 2.4.6)?

Does the time-based media player support (have an icon/link for)?

Captions:
- Can they be turned on and off (if open captions are not presented)?
Audio descriptions:
- Can they be turned on and off?
Transcripts:
- Is the link to the transcripts descriptive?
- Is the link immediately following or preceding the media player?

Report a problem on this page

Date modified:: 2023-11-24

Language selection

Search