Sarek Examples
Learn Sarek through practical examples that demonstrate different GPU computing patterns and optimizations.
Memory & Bandwidth
### [Vector Addition](vector_add.html)
The classic "Hello World" of GPU computing. Demonstrates basic kernel structure, memory operations, and how to achieve peak memory bandwidth.
**Key concepts:** Thread indexing, memory coalescing, bandwidth optimization
### [Matrix Transpose](transpose.html)
Shows the impact of memory access patterns on performance. Compares naive vs tiled implementations.
**Key concepts:** Memory access patterns, shared memory, bank conflicts, tiling optimization
Compute-Bound Operations
### [Matrix Multiplication](matrix_mul.html)
A fundamental compute-intensive operation. Demonstrates how to maximize arithmetic throughput.
**Key concepts:** FLOPS optimization, cache utilization, algorithmic complexity
### [Mandelbrot Set](mandelbrot.html)
Classic fractal generation with heavy arithmetic per pixel. Shows embarrassingly parallel computation.
**Key concepts:** Complex arithmetic, iteration, 2D thread grids
Parallel Patterns
### [Parallel Reduction](reduction.html)
Efficiently compute aggregate operations (sum, max, min) on large arrays using tree-based reduction.
**Key concepts:** Tree reduction, synchronization, warp-level primitives
Performance Data
For detailed performance comparisons across different GPUs and backends, see the Benchmarks section.
Next Steps
- Getting Started Guide - Set up your environment
- Concepts - Understanding Sarek’s design
- API Documentation - Complete API reference