Sarek Examples

Learn Sarek through practical examples that demonstrate different GPU computing patterns and optimizations.

Memory & Bandwidth

### [Vector Addition](vector_add.html) The classic "Hello World" of GPU computing. Demonstrates basic kernel structure, memory operations, and how to achieve peak memory bandwidth. **Key concepts:** Thread indexing, memory coalescing, bandwidth optimization
### [Matrix Transpose](transpose.html) Shows the impact of memory access patterns on performance. Compares naive vs tiled implementations. **Key concepts:** Memory access patterns, shared memory, bank conflicts, tiling optimization

Compute-Bound Operations

### [Matrix Multiplication](matrix_mul.html) A fundamental compute-intensive operation. Demonstrates how to maximize arithmetic throughput. **Key concepts:** FLOPS optimization, cache utilization, algorithmic complexity
### [Mandelbrot Set](mandelbrot.html) Classic fractal generation with heavy arithmetic per pixel. Shows embarrassingly parallel computation. **Key concepts:** Complex arithmetic, iteration, 2D thread grids

Parallel Patterns

### [Parallel Reduction](reduction.html) Efficiently compute aggregate operations (sum, max, min) on large arrays using tree-based reduction. **Key concepts:** Tree reduction, synchronization, warp-level primitives

Performance Data

For detailed performance comparisons across different GPUs and backends, see the Benchmarks section.

Next Steps