Lesson 7 — Composing kernels
A single kernel maps over one array at a time, but many algorithms are
naturally expressed as a pipeline of kernels: the output of
one kernel becomes the input of the next. This lesson shows how to chain two
kernels — square and add — to compute
out[i] = a[i]² + b[i] in two GPU passes.
What makes this different from the earlier lessons: there is now an OCaml
host program (compiled to JavaScript with
js_of_ocaml) that
orchestrates the two GPU dispatches. The host transpiles each kernel source,
runs Kernel A to produce a temporary buffer tmp, then runs
Kernel B with tmp and b to produce the final
result.
Host orchestration (OCaml)
(* Host program (runs in the browser via js_of_ocaml) *)
let run kA_src kB_src ~a ~b ~cb =
match of_source_with_abi WGSL kA_src with
| Error e -> cb_err cb e
| Ok (wgslA, abiA) ->
Webgpu_runtime.run ~wgsl:wgslA ~abi_json:abiA
~inputs:[("a", a); ("tmp", zeros)]
~outputs_wanted:["tmp"]
~on_done:(fun [{tmp}] ->
match of_source_with_abi WGSL kB_src with
| Error e -> cb_err cb e
| Ok (wgslB, abiB) ->
Webgpu_runtime.run ~wgsl:wgslB ~abi_json:abiB
~inputs:[("tmp", tmp); ("b", b); ("out", zeros)]
~outputs_wanted:["out"]
~on_done:(fun [{out}] -> cb_ok cb out) ())
()
Note: the two kernels currently round-trip through the host between stages (tmp is read back from the GPU before being re-uploaded for Kernel B). True GPU-side buffer persistence — avoiding the round-trip — is a planned runtime enhancement.
Your task
Fill in the two kernels below and click Run on my GPU. Both editors use the same OCaml kernel syntax you have seen in earlier lessons. The host program above glues them together automatically.
Fill in the TODOs and click "Run on my GPU".
Hint
Kernel A: a.(i) *. a.(i) (OCaml uses *. for
float multiplication). Kernel B: tmp.(i) +. b.(i).