Lesson 9 — Tree reduction

Adding N numbers on the GPU is not as simple as it looks. You cannot just let every thread accumulate into the same variable — concurrent writes would race. Instead, GPU reduction uses a tree: in each round, half the active threads each add one pair of values, halving the working set. After log⊂2;(N) rounds only one value remains: the total sum.

The pattern, for a single workgroup of 256 threads:

Load each element into shared memory (sdata), initialising out-of-bounds threads to 0.0.
Barrier — ensure all loads are complete.
Set stride s = 128. While s > 0:
- Thread tid < s adds sdata.(tid + s) into sdata.(tid).
- Barrier after each round.
- Halve the stride: s := s / 2.
Thread 0 writes sdata.(0) to output.(block_idx_x).

After 7 rounds (128 → 64 → 32 → 16 → 8 → 4 → 2 → 1) the full sum of all 256 elements lives in sdata.(0), then in output.(0).

Your task

The inner accumulation expression is missing. Replace (* TODO *) — currently the placeholder 0.0 — with the correct expression, then click Run on my GPU. The auto-checker compares output.(0) against the expected sum of all 256 input elements.

Kernel source

Fill in the TODO and click "Run on my GPU".

Hint

Each active thread (those with tid < s) should add the element at distance s from itself. The slot to add is sdata.(tid + s).

Replace (* TODO *) 0.0 with sdata.(tid + s). The full accumulation line becomes:

  sdata.(tid) <- sdata.(tid) +. sdata.(tid + s)

How the tree works step by step (N=8 example)

With sdata = [3, 1, 4, 1, 5, 9, 2, 6] and starting stride s = 4:

Round s=4:  sdata[0]+=sdata[4]  →  3+5=8
            sdata[1]+=sdata[5]  →  1+9=10
            sdata[2]+=sdata[6]  →  4+2=6
            sdata[3]+=sdata[7]  →  1+6=7
            sdata = [8, 10, 6, 7, ...]

Round s=2:  sdata[0]+=sdata[2]  →  8+6=14
            sdata[1]+=sdata[3]  →  10+7=17
            sdata = [14, 17, ...]

Round s=1:  sdata[0]+=sdata[1]  →  14+17=31
            sdata = [31, ...]   ✓  (3+1+4+1+5+9+2+6 = 31)

← Lesson 8 — Shared memory & barriers Back to course overview →