Learn GPU programming with Sarek
These interactive lessons teach the data-parallel programming model through short, runnable exercises. Each lesson lets you edit a Sarek kernel, click Run on my GPU, and instantly see whether your answer is correct β graded against a CPU reference running in the same browser tab.
What is GPGPU and what is a kernel?
A GPU (Graphics Processing Unit) has thousands of small cores that run in parallel. GPGPU (General-Purpose GPU computing) uses those cores to accelerate arbitrary numeric computation β not just graphics.
The central concept is the kernel: a function that runs once per element
of your dataset, each invocation handling exactly one element identified by
its thread index (global_thread_id in Sarek). If you have an array of
256 floats, the GPU launches 256 threads simultaneously, each computing one
output value independently. This is the data-parallel model.
Thread 0: reads a[0], b[0] β writes c[0]
Thread 1: reads a[1], b[1] β writes c[1]
...
Thread 255: reads a[255], b[255] β writes c[255]
No thread needs to wait for another β all 256 compute at once.
How Sarek / SPOC works
Sarek lets you write GPU kernels in OCaml syntax. The compiler (transpiler) reads your kernel expression and emits GPU source code for multiple backends: CUDA, OpenCL, Metal, GLSL, and WGSL.
In these lessons, Sarek transpiles to WGSL (WebGPU Shading Language) and runs the shader on your own GPU directly in your browser via the WebGPU API. The transpiler itself is a WebAssembly + JavaScript bundle loaded from this page β no server is involved.
How to read a Sarek kernel
A minimal kernel looks like:
fun (a : float32 vector) (b : float32 vector) ->
let i = global_thread_id in
b.(i) <- Float32.sin a.(i)
fun (a : float32 vector) ...β the kernelβs typed arguments. Buffers (arrays on the GPU) areelementType vector. Scalar values (ints, floats) are plainint32/float32.let i = global_thread_id inβ binds the thread index. Each of the N threads gets a unique value ofifrom 0 to N-1.b.(i) <- exprβ writesexprinto output bufferbat positioni. Read accesses usea.(i)without the<-.
How the interactive checker works
- You edit the kernel starter in the editor.
- Click Run on my GPU β the page calls
SarekTranspile.transpileWithAbi(source, "wgsl")to compile your kernel to WGSL, thenSarekWebGPU.run(...)to dispatch it on your GPU. - The result is compared element-by-element against a CPU reference (JavaScript). A relative tolerance of Β±0.1 % is allowed for floating-point rounding. You see PASS (green) or FAIL with the first mismatching index.
- Use Show generated WGSL to inspect the shader the transpiler produced.
All datasets use N = 256 for instant iteration. Larger datasets (N β₯ 64 K) would show more realistic GPU speedups but take several seconds to grade in the browser.
Browser requirements
WebGPU requires a recent Chrome or Edge (version 113+). Firefox Nightly
also works with the dom.webgpu.enabled flag. If your browser does not
support WebGPU the lesson text still renders and you can read the kernel, but
the Run button will be disabled with an explanatory message.
Lessons
- Lesson 1 β Vector addition β the data-parallel hello world
- Lesson 2 β Scalar parameters (SAXPY) β passing a scalar into a kernel
- Lesson 3 β Elementwise map β mapping a function over every element
- Lesson 4 β Control flow and bounds β
ifexpressions and index guarding - Lesson 5 β Mandelbrot β a per-pixel loop that generates an image on your GPU
- Lesson 6 β Image filter β a grayscale photo filter rendered in the page (bring your own photo)
- Lesson 7 β Composing kernels β chain two kernels with an OCaml host: square then add, in two GPU passes
- Lesson 8 β Shared memory & barriers β tiled copy with
let%sharedandblock_barrier: why barriers matter - Lesson 9 β Tree reduction β parallel sum of 256 elements using a shared-memory tree reduction with barriers