Appearance
P0-C2: wgpu Pipeline Model
Taught: 2026-04-24 (during Phase 0 Bootstrap) Milestone: Phase 0 — Bootstrap Result: PASS (after architecture-diagram re-teach; original abstract teaching was insufficient) Backfilled: 2026-05-04 — original prose was scattered across the early planning session; reconstructed into the corrected conversational format
Why this concept matters here
This engine renders through wgpu, and Phase 0's goal is "WebGPU canvas renders a non-black pixel." Understanding the wgpu pipeline model — Device, Queue, Surface, Pipeline, BindGroup, command encoder — is the foundation for Renderer::new() and every subsequent shader change. A side question Suriya raised before this concept: "Why wgpu over Slang or OSL?" Answered briefly in Step 7.
The walkthrough
Step 1: The four core handles
Forget the API for a moment. Every modern graphics API (Vulkan, Metal, DirectX 12, WebGPU) has these four handles:
- Adapter — represents a physical or virtual GPU. "What hardware do I have?" You usually have one (your GPU); on systems with multiple GPUs you might pick one. Created once per app.
- Device — your handle for creating resources. "Make me a buffer." "Make me a texture." "Make me a render pipeline." Created once from an Adapter.
- Queue — your handle for submitting work. "Run this list of commands on the GPU." Created alongside Device.
- Surface — the canvas / window you're drawing TO. "Where does the final pixel go?" Created from a window/canvas handle.
In wgpu Rust:
rust
let adapter = instance.request_adapter(&options).await?;
let (device, queue) = adapter.request_device(&desc).await?;
let surface = instance.create_surface(canvas)?;Adapter, Device, Queue are created once at startup. Surface is created once and reconfigured on resize.
Step 2: The render pipeline — created once, reused every frame
A render pipeline is a complete description of "how to draw stuff":
- Which vertex shader to run
- Which fragment shader to run
- What vertex format the input is in
- What blending state to use
- What primitive type (triangle, line, point)
- What format the output texture is
This is one big object you create once, at startup. You don't recreate it per frame. Per frame, you set this pipeline as active and submit draw calls.
rust
let pipeline = device.create_render_pipeline(&RenderPipelineDescriptor {
vertex: VertexState { module: &shader, entry_point: "vs_main", ... },
fragment: Some(FragmentState { module: &shader, entry_point: "fs_main", ... }),
primitive: PrimitiveState::default(),
...
});The pipeline holds references to the shader modules, vertex layouts, etc. Created once. Used for every frame's draw calls until something changes (e.g., user toggles a feature → new pipeline).
Step 3: Bind groups — "here are the resources this shader reads"
Shaders need data: uniform values (camera position, time), textures, sampler states. Bind groups are how you tell the GPU "here's the data the shader will read."
You declare a BindGroupLayout (a schema: "binding 0 is a uniform buffer, binding 1 is a texture, binding 2 is a sampler"). Then you create concrete BindGroup instances that match the schema (actual buffer + texture + sampler).
rust
let bgl = device.create_bind_group_layout(&BindGroupLayoutDescriptor {
entries: &[
BindGroupLayoutEntry {
binding: 0,
visibility: ShaderStages::FRAGMENT,
ty: BindingType::Buffer { ty: BufferBindingType::Uniform, ... },
...
},
],
...
});
let bind_group = device.create_bind_group(&BindGroupDescriptor {
layout: &bgl,
entries: &[BindGroupEntry { binding: 0, resource: uniform_buf.as_entire_binding() }],
...
});In WGSL the shader sees:
wgsl
@group(0) @binding(0) var<uniform> u: SceneUniforms;The @group(0) @binding(0) matches the bind group's slot. The pipeline holds the layout; per-frame draw commands attach an actual bind group instance.
Step 4: The per-frame submission pattern
Every frame, you do:
rust
// 1. Get the next surface texture (the canvas pixel buffer)
let frame = surface.get_current_texture()?;
let view = frame.texture.create_view(&Default::default());
// 2. Update any per-frame data (write uniforms)
queue.write_buffer(&uniform_buf, 0, bytemuck::bytes_of(&new_uniforms));
// 3. Encode commands
let mut encoder = device.create_command_encoder(&Default::default());
{
let mut pass = encoder.begin_render_pass(&RenderPassDescriptor {
color_attachments: &[Some(RenderPassColorAttachment { view: &view, ... })],
...
});
pass.set_pipeline(&pipeline);
pass.set_bind_group(0, &bind_group, &[]);
pass.draw(0..3, 0..1); // draw 3 vertices = our fullscreen triangle
}
// 4. Submit and present
queue.submit(std::iter::once(encoder.finish()));
frame.present();The encoder is throwaway. You build a list of commands, submit it to the queue, and the GPU runs them. Then you call frame.present() to swap the rendered texture onto the screen.
Step 5: Startup vs per-frame — the cost split
This is the key mental model. In wgpu (and every modern graphics API), there are two kinds of work:
Created once at startup (expensive, slow):
- Adapter, Device, Queue, Surface
- Shader modules (compiled from WGSL)
- Render pipelines
- Bind group layouts
- Buffers, textures (allocated)
- Bind groups (binding actual resources to layout)
Done every frame (cheap, fast):
queue.write_bufferfor per-frame data (camera, time, scene state)encoder.begin_render_pass,set_pipeline,set_bind_group,drawqueue.submit,frame.present
The GPU optimizes for this split — it precompiles the pipeline shaders, validates the bind layouts, allocates GPU memory once. Per-frame work is just "here's a list of pointers and draw calls, go." This is why modern graphics is fast: don't burn CPU cycles per frame on validation that could be done once.
Step 6: The fullscreen triangle trick
For a renderer that ray-marches, we don't have geometry to rasterize. We just want the fragment shader to run for every pixel of the canvas, and decide each pixel's color.
Standard trick: draw one triangle that covers the entire screen, with no vertex buffer needed. The vertex shader emits three positions:
wgsl
@vertex
fn vs_fullscreen(@builtin(vertex_index) vi: u32) -> VertexOut {
var positions = array<vec2<f32>, 3>(
vec2<f32>(-1.0, -1.0),
vec2<f32>( 3.0, -1.0),
vec2<f32>(-1.0, 3.0),
);
// ...
}These three points form a triangle that extends past the screen edges. After clipping to NDC [-1, 1], the visible portion covers the entire screen. The fragment shader runs once per pixel. Then pass.draw(0..3, 0..1) — three vertices, one instance.
No vertex buffers. No vertex layout. Just: "draw 3 vertices using this pipeline, each pixel will execute the fragment shader."
Step 7: Why wgpu over Slang / OSL / OpenCL?
(Asked separately during the Phase 0 design discussion; recorded here for completeness.)
- wgpu is the Rust binding for WebGPU — runs in browsers AND natively (Metal/Vulkan/DX12 backends). Same code, multi-platform.
- Slang is a shader language (Nvidia/research). Compiles to HLSL/SPIRV. Doesn't have first-class web/wasm story yet (as of early 2026). Excellent language; lacks the deployment target we need.
- OSL (Open Shading Language) is for offline rendering (path tracers like Arnold, Cycles). Not real-time, not GPU. Wrong tool.
- OpenCL is general-purpose compute. Not graphics. Doesn't have a render pipeline.
For "browser-native real-time renderer," wgpu + WGSL is the obvious right choice. The portability claim is genuine — same code runs in Chrome/Safari/Firefox AND as a native macOS/Windows/Linux app via Tauri later.
The mental model in one sentence
wgpu has four core handles (Adapter, Device, Queue, Surface) that you set up once; render pipelines and bind groups describe "how to draw" and "what to read" and are also created once; per-frame work is just "encode commands, submit, present" against those pre-built objects.
Explain-back question
When
Engine::create()runs and the browser starts rendering, walk through which of the wgpu objects are created once at startup vs every frame, and explain why the GPU is fast precisely because of this split.
User's answer (PASS, on second attempt)
First we initialize the wgpu pipeline: this includes registering the device, creating the pipeline/queue and compiling the shader into uniforms in the GPU. Then per-frame, we write to uniforms and submit a render pass. Vertex shader for fullscreen triangle, fragment shader takes the uniforms and computes per pixel color.
Judgment
PASS. Got:
- Startup vs per-frame split — registered device, created pipeline/queue, compiled shader once; uniforms updated and render pass submitted per frame ✓
- Vertex/fragment role — VS produces the fullscreen triangle, FS computes per-pixel color ✓
- Why it's fast — implicit in his framing (don't redo expensive setup each frame) ✓
First attempt was rougher (mixed up the layers). Second pass after I drew the architecture diagram showing the four handles + per-frame loop was clean.