P0-C2: wgpu Pipeline Model

Taught: 2026-04-24 (during Phase 0 Bootstrap) Milestone: Phase 0 — Bootstrap Result: PASS (after architecture-diagram re-teach; original abstract teaching was insufficient) Backfilled: 2026-05-04 — original prose was scattered across the early planning session; reconstructed into the corrected conversational format

Why this concept matters here

This engine renders through wgpu, and Phase 0's goal is "WebGPU canvas renders a non-black pixel." Understanding the wgpu pipeline model — Device, Queue, Surface, Pipeline, BindGroup, command encoder — is the foundation for Renderer::new() and every subsequent shader change. A side question Suriya raised before this concept: "Why wgpu over Slang or OSL?" Answered briefly in Step 7.

The walkthrough

Step 1: The four core handles

Forget the API for a moment. Every modern graphics API (Vulkan, Metal, DirectX 12, WebGPU) has these four handles:

Adapter — represents a physical or virtual GPU. "What hardware do I have?" You usually have one (your GPU); on systems with multiple GPUs you might pick one. Created once per app.
Device — your handle for creating resources. "Make me a buffer." "Make me a texture." "Make me a render pipeline." Created once from an Adapter.
Queue — your handle for submitting work. "Run this list of commands on the GPU." Created alongside Device.
Surface — the canvas / window you're drawing TO. "Where does the final pixel go?" Created from a window/canvas handle.

In wgpu Rust:

rust

let adapter = instance.request_adapter(&options).await?;
let (device, queue) = adapter.request_device(&desc).await?;
let surface = instance.create_surface(canvas)?;

Adapter, Device, Queue are created once at startup. Surface is created once and reconfigured on resize.

Step 2: The render pipeline — created once, reused every frame

A render pipeline is a complete description of "how to draw stuff":

Which vertex shader to run
Which fragment shader to run
What vertex format the input is in
What blending state to use
What primitive type (triangle, line, point)
What format the output texture is

This is one big object you create once, at startup. You don't recreate it per frame. Per frame, you set this pipeline as active and submit draw calls.

rust

let pipeline = device.create_render_pipeline(&RenderPipelineDescriptor {
    vertex: VertexState { module: &shader, entry_point: "vs_main", ... },
    fragment: Some(FragmentState { module: &shader, entry_point: "fs_main", ... }),
    primitive: PrimitiveState::default(),
    ...
});

The pipeline holds references to the shader modules, vertex layouts, etc. Created once. Used for every frame's draw calls until something changes (e.g., user toggles a feature → new pipeline).

Step 3: Bind groups — "here are the resources this shader reads"

Shaders need data: uniform values (camera position, time), textures, sampler states. Bind groups are how you tell the GPU "here's the data the shader will read."

You declare a BindGroupLayout (a schema: "binding 0 is a uniform buffer, binding 1 is a texture, binding 2 is a sampler"). Then you create concrete BindGroup instances that match the schema (actual buffer + texture + sampler).

rust

let bgl = device.create_bind_group_layout(&BindGroupLayoutDescriptor {
    entries: &[
        BindGroupLayoutEntry {
            binding: 0,
            visibility: ShaderStages::FRAGMENT,
            ty: BindingType::Buffer { ty: BufferBindingType::Uniform, ... },
            ...
        },
    ],
    ...
});

let bind_group = device.create_bind_group(&BindGroupDescriptor {
    layout: &bgl,
    entries: &[BindGroupEntry { binding: 0, resource: uniform_buf.as_entire_binding() }],
    ...
});

In WGSL the shader sees:

wgsl

@group(0) @binding(0) var<uniform> u: SceneUniforms;

The @group(0) @binding(0) matches the bind group's slot. The pipeline holds the layout; per-frame draw commands attach an actual bind group instance.

Step 4: The per-frame submission pattern

Every frame, you do:

rust

// 1. Get the next surface texture (the canvas pixel buffer)
let frame = surface.get_current_texture()?;
let view = frame.texture.create_view(&Default::default());

// 2. Update any per-frame data (write uniforms)
queue.write_buffer(&uniform_buf, 0, bytemuck::bytes_of(&new_uniforms));

// 3. Encode commands
let mut encoder = device.create_command_encoder(&Default::default());
{
    let mut pass = encoder.begin_render_pass(&RenderPassDescriptor {
        color_attachments: &[Some(RenderPassColorAttachment { view: &view, ... })],
        ...
    });
    pass.set_pipeline(&pipeline);
    pass.set_bind_group(0, &bind_group, &[]);
    pass.draw(0..3, 0..1);  // draw 3 vertices = our fullscreen triangle
}

// 4. Submit and present
queue.submit(std::iter::once(encoder.finish()));
frame.present();

The encoder is throwaway. You build a list of commands, submit it to the queue, and the GPU runs them. Then you call frame.present() to swap the rendered texture onto the screen.

Step 5: Startup vs per-frame — the cost split

This is the key mental model. In wgpu (and every modern graphics API), there are two kinds of work:

Created once at startup (expensive, slow):

Adapter, Device, Queue, Surface
Shader modules (compiled from WGSL)
Render pipelines
Bind group layouts
Buffers, textures (allocated)
Bind groups (binding actual resources to layout)

Done every frame (cheap, fast):

queue.write_buffer for per-frame data (camera, time, scene state)
encoder.begin_render_pass, set_pipeline, set_bind_group, draw
queue.submit, frame.present

The GPU optimizes for this split — it precompiles the pipeline shaders, validates the bind layouts, allocates GPU memory once. Per-frame work is just "here's a list of pointers and draw calls, go." This is why modern graphics is fast: don't burn CPU cycles per frame on validation that could be done once.

Step 6: The fullscreen triangle trick

For a renderer that ray-marches, we don't have geometry to rasterize. We just want the fragment shader to run for every pixel of the canvas, and decide each pixel's color.

Standard trick: draw one triangle that covers the entire screen, with no vertex buffer needed. The vertex shader emits three positions:

wgsl

@vertex
fn vs_fullscreen(@builtin(vertex_index) vi: u32) -> VertexOut {
    var positions = array<vec2<f32>, 3>(
        vec2<f32>(-1.0, -1.0),
        vec2<f32>( 3.0, -1.0),
        vec2<f32>(-1.0,  3.0),
    );
    // ...
}

These three points form a triangle that extends past the screen edges. After clipping to NDC [-1, 1], the visible portion covers the entire screen. The fragment shader runs once per pixel. Then pass.draw(0..3, 0..1) — three vertices, one instance.

No vertex buffers. No vertex layout. Just: "draw 3 vertices using this pipeline, each pixel will execute the fragment shader."

Step 7: Why wgpu over Slang / OSL / OpenCL?

(Asked separately during the Phase 0 design discussion; recorded here for completeness.)

wgpu is the Rust binding for WebGPU — runs in browsers AND natively (Metal/Vulkan/DX12 backends). Same code, multi-platform.
Slang is a shader language (Nvidia/research). Compiles to HLSL/SPIRV. Doesn't have first-class web/wasm story yet (as of early 2026). Excellent language; lacks the deployment target we need.
OSL (Open Shading Language) is for offline rendering (path tracers like Arnold, Cycles). Not real-time, not GPU. Wrong tool.
OpenCL is general-purpose compute. Not graphics. Doesn't have a render pipeline.

For "browser-native real-time renderer," wgpu + WGSL is the obvious right choice. The portability claim is genuine — same code runs in Chrome/Safari/Firefox AND as a native macOS/Windows/Linux app via Tauri later.

The mental model in one sentence

wgpu has four core handles (Adapter, Device, Queue, Surface) that you set up once; render pipelines and bind groups describe "how to draw" and "what to read" and are also created once; per-frame work is just "encode commands, submit, present" against those pre-built objects.

Explain-back question

When Engine::create() runs and the browser starts rendering, walk through which of the wgpu objects are created once at startup vs every frame, and explain why the GPU is fast precisely because of this split.

User's answer (PASS, on second attempt)

First we initialize the wgpu pipeline: this includes registering the device, creating the pipeline/queue and compiling the shader into uniforms in the GPU. Then per-frame, we write to uniforms and submit a render pass. Vertex shader for fullscreen triangle, fragment shader takes the uniforms and computes per pixel color.

Judgment

PASS. Got:

Startup vs per-frame split — registered device, created pipeline/queue, compiled shader once; uniforms updated and render pass submitted per frame ✓
Vertex/fragment role — VS produces the fullscreen triangle, FS computes per-pixel color ✓
Why it's fast — implicit in his framing (don't redo expensive setup each frame) ✓

First attempt was rougher (mixed up the layers). Second pass after I drew the architecture diagram showing the four handles + per-frame loop was clean.

P0-C2: wgpu Pipeline Model ​

Why this concept matters here ​

The walkthrough ​

Step 1: The four core handles ​

Step 2: The render pipeline — created once, reused every frame ​

Step 3: Bind groups — "here are the resources this shader reads" ​

Step 4: The per-frame submission pattern ​

Step 5: Startup vs per-frame — the cost split ​

Step 6: The fullscreen triangle trick ​

Step 7: Why wgpu over Slang / OSL / OpenCL? ​

The mental model in one sentence ​

Explain-back question ​

User's answer (PASS, on second attempt) ​

Judgment ​

P0-C2: wgpu Pipeline Model

Why this concept matters here

The walkthrough

Step 1: The four core handles

Step 2: The render pipeline — created once, reused every frame

Step 3: Bind groups — "here are the resources this shader reads"

Step 4: The per-frame submission pattern

Step 5: Startup vs per-frame — the cost split

Step 6: The fullscreen triangle trick

Step 7: Why wgpu over Slang / OSL / OpenCL?

The mental model in one sentence

Explain-back question

User's answer (PASS, on second attempt)

Judgment