Speaking Guide

CELPIP Speaking Task 3: Picture Description (CLB 9 Sample Answer)

Last updated: June 2026

By Mark Wilson · Updated June 2026

Task 3 at a glance

You see a photograph and must describe it for 30–45 seconds. You have 30 seconds of preparation time. This task tests your descriptive vocabulary, spatial language, and ability to infer meaning from visual context.

What scorers are looking for in Task 3

Most test-takers describe only what is literally visible. CLB 9 responses go further — they describe location, action, emotion, and inference. Scorers are assessing:

Coherence: Does your description follow a logical order (e.g. background → foreground → specific detail)?
Vocabulary: Do you use precise spatial language ("in the foreground", "to the left of", "partially obscured by") and descriptive adjectives?
Task Fulfillment: Do you fill the full speaking time without long pauses?

The 4-part structure for Task 3

Setting (2–3 sentences)

Where is this? Indoors or outdoors? What time of day, season, or type of location? 'This image appears to show an outdoor market on what looks like a sunny afternoon.'

Background (2 sentences)

What is in the background? Buildings, landscape, other people? Use spatial language: 'In the background, there are several stalls with colourful canopies.'

Foreground and main subjects (3–4 sentences)

This is the core of your response. Describe the main people or objects. Include actions, estimated ages, clothing if visible, and expressions.

Inference (1–2 sentences)

What might be happening? Why are these people here? 'It appears that the two women are friends who have met at the market, possibly looking for fresh produce.' Inference shows language range.

CLB 9 sample answer

Image description: A busy indoor shopping mall. In the background, several shops are visible with lit signs. In the foreground, two women are sitting on a bench looking at shopping bags. One woman appears to be in her 30s wearing a red jacket; the other is younger, in a grey coat. Both are smiling.

Sample response — CLB 9 level (~40 seconds)

"This image shows the interior of a large, busy shopping mall. In the background, I can see a number of retail stores with brightly lit signs and displays, suggesting this is a popular commercial centre.

In the foreground, two women are seated together on a bench. The woman on the left appears to be in her early to mid-thirties and is wearing a red jacket. She is holding several shopping bags on her lap. The woman to her right is somewhat younger, dressed in a light grey coat, and is also holding bags. Both women are smiling and appear to be engaged in a conversation.

Based on what I can see, it looks as though the two women have spent some time shopping together and are taking a short break. The relaxed body language and smiles suggest they are enjoying themselves and are likely friends or family members on a shopping outing."

Spatial vocabulary to learn before the test

These phrases signal a CLB 9+ response when used correctly:

"In the foreground / background / middle ground"
"To the left/right of", "directly behind", "partially obscured by"
"It appears that", "based on what I can see", "it looks as though"
"Approximately", "what appears to be", "presumably"
"Facing toward / away from the camera"
"In what appears to be a [indoor/outdoor/urban/rural] setting"

Preparation time strategy (30 seconds)

Use the 30 seconds to scan the image in order: top-left to bottom-right. Mentally note: (1) setting type, (2) main subjects and their positions, (3) one specific detail like clothing or expression, (4) one inference you can make. Do not write full sentences — just keywords. Your spoken response should flow naturally, not sound like you are reading notes.

Common mistakes in Task 3

Listing without describing. "There is a woman. There are shops. There are bags." — this is CLB 6 at best. Connect observations with spatial context and inference.
Stopping early. Silence in the final 10 seconds costs marks for Task Fulfillment. If you have covered the main content, move to inference or speculate about what happened before or after the scene.
Starting with "I see". It is grammatically correct but monotonous. Open with the setting instead: "This image appears to show..."

Speaking Practice Tool Full Speaking Guide Improve Speaking from 7 to 9 Which Section First? Quiz