en fr

Anatomy of a Frame — The Complete Lifecycle of suckless-ogl

Posted on Sun 29 March 2026 in Development

Anatomy of a Frame: The Complete Lifecycle of suckless-ogl

From main() to photons on screen — a full deep-dive into a modern OpenGL PBR engine written in C.

The final render of suckless-ogl — 100 PBR spheres lit by IBL

The final render: 100 metallic and dielectric spheres, lit by an HDR environment map, with full post-processing.


Introduction

suckless-ogl is a minimalist, high-performance PBR (Physically-Based Rendering) engine written in C11 with OpenGL 4.4 Core Profile. It displays a grid of 100 spheres with varied materials (metals, dielectrics, paints, organics…) lit by Image-Based Lighting (IBL), with a full post-processing pipeline: bloom, depth of field, motion blur, FXAA, tone mapping, color grading

This article traces the complete lifecycle of the application: from the first byte allocated in main() to the moment the GPU presents the first fully-lit frame on screen. We'll walk through every layer — CPU memory, GPU resources, the X11/GLFW windowing handshake, OpenGL context creation, shader compilation, async texture loading, and the multi-pass rendering architecture that produces each frame.

What We'll Cover

Chapter Topic
1 The entry point (main())
2 Opening a window (GLFW + X11 + OpenGL)
3 CPU-side initialization (camera, threads, buffers)
4 Scene initialization (GPU)
5 Post-processing pipeline
6 Async HDR environment loading
7 Progressive IBL generation
8 The main loop
9 Rendering a frame
10 Post-processing in detail
11 The first visible frame
12 GPU memory budget

Chapter 1 — The Entry Point

Everything begins in main() (src/main.c):

int main(int argc, char* argv[])
{
    tracy_manager_init_global();          // 1. Profiler bootstrap

    CliAction action = cli_handle_args(argc, argv);  // 2. CLI parsing
    if (action == CLI_ACTION_EXIT_SUCCESS) return EXIT_SUCCESS;
    if (action == CLI_ACTION_EXIT_FAILURE) return EXIT_FAILURE;

    // 3. SIMD-aligned allocation of the App structure
    App* app = (App*)platform_aligned_alloc(sizeof(App), SIMD_ALIGNMENT);
    *app = (App){0};

    // 4. Full initialization
    if (!app_init(app, WINDOW_WIDTH, WINDOW_HEIGHT, "Icosphere Phong"))
        { app_cleanup(app); platform_aligned_free(app); return EXIT_FAILURE; }

    // 5. Main loop
    app_run(app);

    // 6. Cleanup
    app_cleanup(app);
    platform_aligned_free(app);
    return EXIT_SUCCESS;
}

The design is intentionally simple — all complexity is encapsulated in app_init()app_run()app_cleanup().

Design Decisions

Decision Why?
SIMD-aligned allocation The App struct contains mat4/vec3 fields (via cglm) that benefit from 16-byte alignment for SSE/NEON vectorization
Zero-init {0} Deterministic state — every pointer starts NULL, every flag starts 0
Tracy first The profiler must be initialized before all other subsystems to capture the full timeline
Single App struct All application state lives in one contiguous allocation — cache-friendly, easy to pass around
graph TD
    A("🚀 main()") --> B("app_init()")
    B --> B1("Window + OpenGL Context")
    B --> B2("Camera & Input")
    B --> B3("Scene — GPU Resources")
    B --> B4("Async Loader Thread")
    B --> B5("Post-Processing Pipeline")
    B --> B6("Profiling Systems")
    B1 & B2 & B3 & B4 & B5 & B6 --> C("app_run() — Main Loop")
    C --> C1("Poll Events")
    C1 --> C2("Camera Physics")
    C2 --> C3("renderer_draw_frame()")
    C3 --> C4("SwapBuffers")
    C4 -->|"next frame"| C1
    C --> D("app_cleanup()")
    D --> E("🏁 End")

    classDef entryExit fill:#fce4ec,stroke:#c2185b,stroke-width:2.5px,color:#2d2d2d
    classDef keyFunc fill:#fff59d,stroke:#f9a825,stroke-width:2px,color:#2d2d2d
    classDef subsystem fill:#ffffff,stroke:#aaaaaa,stroke-width:1.5px,color:#444444
    classDef loopNode fill:#e3f2fd,stroke:#42a5f5,stroke-width:1.5px,color:#2d2d2d

    class A,E entryExit
    class B,C,D keyFunc
    class B1,B2,B3,B4,B5,B6 subsystem
    class C1,C2,C3,C4 loopNode

Chapter 2 — Opening a Window (GLFW + X11 + OpenGL)

The first real work happens in window_create() (src/window.c).

2.1 — GLFW Initialization and Window Hints

glfwInit();
glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 4);
glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 4);          // OpenGL 4.4
glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE);
glfwWindowHint(GLFW_OPENGL_DEBUG_CONTEXT, GL_TRUE);     // Debug messages
glfwWindowHint(GLFW_SAMPLES, DEFAULT_SAMPLES);           // MSAA = 1 (off)

Behind the scenes, GLFW performs a full X11 handshake:

sequenceDiagram
    participant App as Application
    participant GLFW as GLFW
    participant X11 as X11 Server
    participant Mesa as Mesa/GPU Driver
    participant GPU as GPU

    App->>GLFW: glfwInit()
    GLFW->>X11: XOpenDisplay()
    X11-->>GLFW: Display* (connection)

    App->>GLFW: glfwCreateWindow(1920, 1080)
    GLFW->>X11: XCreateWindow() + GLX setup
    X11->>Mesa: glXCreateContextAttribsARB(4.4 Core, Debug)
    Mesa->>GPU: Allocate command buffer + context state
    Mesa-->>X11: GLXContext
    X11-->>GLFW: Window + Context ready

    App->>GLFW: glfwMakeContextCurrent()
    GLFW->>Mesa: glXMakeCurrent()
    Mesa->>GPU: Bind context to calling thread

2.2 — GLAD: Loading OpenGL Function Pointers

gladLoadGLLoader((GLADloadproc)glfwGetProcAddress);

OpenGL is not a library in the traditional sense — it's a specification. The actual function addresses live inside the GPU driver (Mesa, NVIDIA, AMD). GLAD queries each address at runtime via glXGetProcAddress and populates a function pointer table. After this call, glCreateShader, glDispatchCompute, etc. become usable.

2.3 — OpenGL Debug Context

setup_opengl_debug();

This enables GL_DEBUG_OUTPUT_SYNCHRONOUS and registers a callback that intercepts every GL error, warning, and performance hint. A hash table deduplicates messages (log only first occurrence).

2.4 — Input Capture and VSync

glfwSwapInterval(0);                    // VSync OFF — unlimited FPS
glfwSetInputMode(app->window, GLFW_CURSOR, GLFW_CURSOR_DISABLED);  // FPS-style

The cursor is captured in relative mode — mouse movements produce delta offsets for orbit camera control.


Chapter 3 — CPU-Side Initialization

Before touching the GPU, several CPU-side systems are bootstrapped.

3.1 — The Orbit Camera

camera_init(&app->camera, 20.0F, -90.0F, 0.0F);

The camera starts at:

  • Distance: 20 units from the origin
  • Yaw: −90° (looking along −Z)
  • Pitch: 0° (horizon level)
  • FOV: 60° vertical
  • Z-clip: [0.1, 1000.0]

It uses a fixed-timestep physics model (60 Hz) with exponential smoothing for rotation:

graph LR
    subgraph "Camera Update Pipeline"
        A("Mouse Delta") -->|"EMA filter"| B("yaw_target / pitch_target")
        B -->|"Lerp α=0.1"| C("yaw / pitch (smoothed)")
        C --> D("camera_update_vectors()")
        D --> E("front, right, up vectors")
        E --> F("View Matrix (lookAt)")
    end

    subgraph "Physics (Fixed 60Hz)"
        G("WASD Keys") --> H("Target Velocity")
        H -->|"acceleration × dt"| I("Current Velocity")
        I -->|"friction"| J("Position += vel × dt")
        J --> K("Head bobbing (sine wave)")
    end

    classDef keyFunc fill:#fff59d,stroke:#f9a825,stroke-width:2px,color:#2d2d2d
    classDef subsystem fill:#ffffff,stroke:#aaaaaa,stroke-width:1.5px,color:#444444
    classDef loopNode fill:#e3f2fd,stroke:#42a5f5,stroke-width:1.5px,color:#2d2d2d

    class D,F keyFunc
    class A,B,C,E subsystem
    class G,H,I,J,K loopNode

3.2 — Async Loader Thread

app->async_loader = async_loader_create(&app->tracy_mgr);

A dedicated POSIX thread is spawned for background I/O. It sleeps on a condition variable (pthread_cond_wait) until work is queued. This prevents disk reads from stalling the render loop.

stateDiagram-v2
    [*] --> IDLE
    IDLE --> PENDING: async_loader_request()
    PENDING --> LOADING: Worker wakes up
    LOADING --> WAITING_FOR_PBO: I/O complete, needs GPU buffer
    WAITING_FOR_PBO --> CONVERTING: Main thread provides PBO
    CONVERTING --> READY: SIMD Float→Half conversion done
    READY --> IDLE: Main thread consumes result

Chapter 4 — Scene Initialization (The GPU Wakes Up)

scene_init() (src/scene.c) is where the GPU gets its first real work.

4.1 — Default Scene State

scene->subdivisions    = 3;                     // Icosphere level 3
scene->wireframe       = 0;                     // Solid fill
scene->show_envmap     = 1;                     // Skybox visible
scene->billboard_mode  = 1;                     // Transparent spheres (billboard)
scene->sorting_mode    = SORTING_MODE_GPU_BITONIC;  // GPU sorting
scene->gi_mode         = GI_MODE_OFF;           // No GI
scene->specular_aa_enabled = 1;                 // Curvature-based AA

4.2 — Dummy Textures and BRDF LUT

Two sentinel textures are created immediately — they serve as fallbacks whenever an IBL texture isn't ready:

scene->dummy_black_tex = render_utils_create_color_texture(0.0, 0.0, 0.0, 0.0);  // 1×1 RGBA
scene->dummy_white_tex = render_utils_create_color_texture(1.0, 1.0, 1.0, 1.0);  // 1×1 RGBA

Then the BRDF LUT (Look-Up Table) is generated once via compute shader:

scene->brdf_lut_tex = build_brdf_lut_map(512);
Property Value
Size 512 × 512
Format GL_RG16F (2 channels, 16-bit float each)
Content Pre-integrated split-sum BRDF (Schlick-GGX)
Shader shaders/IBL/spbrdf.glsl (compute)
Work groups 16 × 16 (512/32 per axis)

This texture maps (NdotV, roughness)(F0_scale, F0_bias) and is used every frame by the PBR fragment shader to avoid expensive real-time BRDF integration.

4.3 — Two Rendering Modes: Billboard Ray-Tracing vs. Icosphere Mesh

The engine supports two sphere rendering strategies. The default is billboard ray-tracing.

Default: Billboard + Per-Pixel Ray-Tracing (billboard_mode = 1)

Each sphere is rendered as a single screen-aligned quad (4 vertices, 2 triangles). The fragment shader performs an analytical ray-sphere intersection per pixel, producing mathematically perfect spheres.

Billboard AABB geometry — the projected quad encloses the sphere on screen

The vertex shader projects a tight quad around the sphere's screen-space bounding box via analytical tangent-line computation.

Advantages:

  • Pixel-perfect silhouettes (no polygon faceting, ever)
  • Correct per-pixel depth (gl_FragDepth written from the ray hit point)
  • Analytically smooth normals (normalized hitPos − center)
  • Edge anti-aliasing via smooth discriminant falloff
  • True alpha transparency (glass-like, with back-to-front sorting)
graph LR
    subgraph "Billboard Ray-Tracing (Default)"
        A("4-vertex Quad
(per instance)") -->|"Vertex Shader:
project to sphere bounds"| B("Screen-space quad") B -->|"Fragment Shader:
ray-sphere intersection"| C("Perfect sphere
per-pixel normal + depth") end subgraph "Icosphere Mesh (Fallback)" D("642-vertex mesh
(subdivided icosahedron)") -->|"Rasterized as
triangles"| E("Polygon approximation
(faceted at low subdiv)") end classDef highlight fill:#fff59d,stroke:#f9a825,stroke-width:2px,color:#2d2d2d classDef fallback fill:#f5f5f5,stroke:#bdbdbd,stroke-width:1.5px,color:#666666 class A,B,C highlight class D,E fallback

💡 Why Billboard Ray-Tracing? With 100 spheres, the billboard approach uses 100 × 4 = 400 vertices total, versus 100 × 642 = 64,200 vertices for level-3 icospheres. More importantly, the spheres are mathematically perfect at every zoom level — no tessellation artifacts.

Fallback: Instanced Icosphere Mesh (billboard_mode = 0)

The icosphere path generates a recursively subdivided icosahedron:

graph LR
    A("Level 0
12 vertices
20 triangles") -->|"Subdivide"| B("Level 1
42 vertices
80 triangles") B -->|"Subdivide"| C("Level 2
162 vertices
320 triangles") C -->|"Subdivide"| D("Level 3
642 vertices
1,280 triangles") D -->|"..."| E("Level 6
~40k vertices") classDef keyFunc fill:#fff59d,stroke:#f9a825,stroke-width:2px,color:#2d2d2d classDef subsystem fill:#ffffff,stroke:#aaaaaa,stroke-width:1.5px,color:#444444 class D keyFunc class A,B,C,E subsystem

4.4 — Material Library

scene->material_lib = material_load_presets("assets/materials/pbr_materials.json");

The JSON file defines 101 PBR material presets organized by category:

Category Examples Metallic Roughness
Pure Metals Gold, Silver, Copper, Chrome 1.0 0.05–0.2
Weathered Metals Rusty Iron, Oxidized Copper 0.7–0.95 0.4–0.8
Glossy Dielectrics Colored Plastics 0.0 0.05–0.15
Matte Materials Fabric, Clay, Sand 0.0 0.65–0.95
Stones Granite, Marble, Obsidian 0.0 0.35–0.85
Organics Oak, Leather, Bone 0.0 0.35–0.75
Paints Car Paint, Pearl, Satin 0.3–0.7 0.1–0.5
Technical Rubber, Carbon, Ceramic 0.0–0.1 0.05–0.85

Each material provides: albedo (RGB), metallic (0–1), roughness (0–1).

4.5 — The Instance Grid

const int cols    = 10;       // DEFAULT_COLS
const float spacing = 2.5F;   // DEFAULT_SPACING

A 10×10 grid of 100 spheres is laid out in the XY plane, centered at the origin:

Grid dimensions:
  Width  = (10 - 1) × 2.5 = 22.5 units
  Height = (10 - 1) × 2.5 = 22.5 units
  Z = 0 (all spheres in the same plane)

Each instance stores 88 bytes:

typedef struct SphereInstance {
    mat4  model;      // 64 bytes — 4×4 transform matrix
    vec3  albedo;     // 12 bytes — RGB color
    float metallic;   //  4 bytes
    float roughness;  //  4 bytes
    float ao;         //  4 bytes — always 1.0
} SphereInstance;     // Total: 88 bytes per instance

4.6 — VAO Layout (Billboard Mode)

In billboard mode, the VAO binds a 4-vertex quad and per-instance material data:

┌────────────────────────────────────────────────────────────────┐
│              Billboard VAO (Default Rendering Mode)            │
├────────────┬────────────┬─────────────────────────────────────┤
│  Location  │  Source    │  Description                        │
├────────────┼────────────┼─────────────────────────────────────┤
│  0         │  Quad VBO  │  vec3 position   (±0.5 quad verts)  │
│  1         │  Quad VBO  │  vec3 normal     (stub, unused)     │
│  2–5       │  Inst VBO  │  mat4 model      (per-instance)     │
│  6         │  Inst VBO  │  vec3 albedo     (per-instance)     │
│  7         │  Inst VBO  │  vec3 pbr (M,R,AO) (per-instance)   │
└────────────┴────────────┴─────────────────────────────────────┘

Location 0–1: glVertexAttribDivisor = 0 (advance per vertex, 4 verts)
Location 2–7: glVertexAttribDivisor = 1 (advance per instance)

Draw call: glDrawArraysInstanced(GL_TRIANGLE_STRIP, 0, 4, 100) — 100 quads, face culling disabled.

4.7 — Shader Compilation

All shaders are compiled during scene_init(). The loader (src/shader.c) supports a custom @header include system:

// In pbr_ibl_instanced.frag:
@header "pbr_functions.glsl"
@header "sh_probe.glsl"

This recursively inlines files (max depth: 16) with include-guard deduplication.

graph TD
    INIT("scene_init() — Shader Compilation") --> REND
    INIT --> COMP
    INIT --> POST

    subgraph REND ["🎨 Rendering Programs"]
        direction TB
        PBR("PBR Instanced — pbr_ibl_instanced.vert/.frag")
        BB("PBR Billboard — pbr_ibl_billboard.vert/.frag")
        SKY("Skybox — background.vert/.frag")
        UI("UI Overlay — ui.vert/.frag")
    end

    subgraph COMP ["⚡ Compute Shaders"]
        direction TB
        SPMAP("Specular Prefilter — IBL/spmap.glsl")
        IRMAP("Irradiance Conv. — IBL/irmap.glsl")
        BRDF("BRDF LUT — IBL/spbrdf.glsl")
        LUM("Luminance Reduction — IBL/luminance_reduce")
    end

    subgraph POST ["✨ Post-Process"]
        direction TB
        PP("Final Composite — postprocess.vert/.frag")
        BL("Bloom — down/up/prefilter")
    end

    classDef keyFunc fill:#fff59d,stroke:#f9a825,stroke-width:2px,color:#2d2d2d
    classDef render fill:#e3f2fd,stroke:#42a5f5,stroke-width:1.5px,color:#2d2d2d
    classDef compute fill:#f3e5f5,stroke:#ab47bc,stroke-width:1.5px,color:#2d2d2d
    classDef postfx fill:#fce4ec,stroke:#c2185b,stroke-width:1.5px,color:#2d2d2d

    class INIT keyFunc
    class PBR,BB,SKY,UI render
    class SPMAP,IRMAP,BRDF,LUM compute
    class PP,BL postfx

Chapter 5 — Post-Processing Pipeline Setup

postprocess_init(&app->postprocess, &app->gpu_profiler, 1920, 1080);

5.1 — The Scene FBO (Multi-Render Target)

The main offscreen framebuffer uses MRT (Multiple Render Targets):

Attachment Format Size Purpose
GL_COLOR_ATTACHMENT0 GL_RGBA16F 1920×1080 HDR scene color (alpha = luma for FXAA)
GL_COLOR_ATTACHMENT1 GL_RG16F 1920×1080 Per-pixel velocity for motion blur
GL_DEPTH_STENCIL_ATTACHMENT GL_DEPTH32F_STENCIL8 1920×1080 Depth buffer + stencil mask
Stencil view GL_R8UI 1920×1080 Read-only stencil as texture
graph TD
    FBO("Scene FBO — Multi-Render Target") --> C0
    FBO --> C1
    FBO --> DS
    DS --> SV

    C0("🟦 Color 0 — GL_RGBA16F
HDR Scene Color") C1("🟩 Color 1 — GL_RG16F
Velocity Vectors") DS("🟫 Depth/Stencil — GL_DEPTH32F_STENCIL8") SV("🟪 Stencil View — GL_R8UI (TextureView)") classDef keyFunc fill:#fff59d,stroke:#f9a825,stroke-width:2px,color:#2d2d2d classDef color0 fill:#e3f2fd,stroke:#42a5f5,stroke-width:1.5px,color:#2d2d2d classDef color1 fill:#e8f5e9,stroke:#66bb6a,stroke-width:1.5px,color:#2d2d2d classDef depth fill:#fff3e0,stroke:#ff9800,stroke-width:1.5px,color:#2d2d2d classDef stencil fill:#f3e5f5,stroke:#ab47bc,stroke-width:1.5px,color:#2d2d2d class FBO keyFunc class C0 color0 class C1 color1 class DS depth class SV stencil

5.2 — Sub-Effect Resources

Each post-processing effect initializes its own resources:

Effect GPU Resources
Bloom Mip-chain FBOs (6 levels), prefilter/downsample/upsample textures
DoF Blur texture, CoC (Circle of Confusion) texture
Auto-Exposure Luminance downsample texture, 2× PBOs (readback), 2× GLSync fences
Motion Blur Tile-max velocity texture (compute), neighbor-max texture (compute)
3D LUT 32³ GL_TEXTURE_3D loaded from .cube files

5.3 — Default Active Effects

postprocess_enable(&app->postprocess, POSTFX_FXAA);  // Only FXAA

On startup, only FXAA is active. Other effects are toggled at runtime via keyboard shortcuts.


Chapter 6 — The First HDR Environment Load

env_manager_load(&app->env_mgr, app->async_loader, "env.hdr");

This triggers the asynchronous environment loading pipeline — the most complex multi-frame operation in the engine.

6.1 — Async Loading Sequence

sequenceDiagram
    participant Main as Main Thread (Render)
    participant Worker as Async Worker Thread
    participant GPU as GPU

    Main->>Worker: async_loader_request("env.hdr")
    Note over Worker: State: PENDING → LOADING
    Worker->>Worker: stbi_loadf() — decode HDR to float RGBA
    Note over Worker: ~50ms for 2K HDR on NVMe

    Worker-->>Main: State: WAITING_FOR_PBO
    Main->>GPU: glGenBuffers() → PBO
    Main->>GPU: glMapBuffer(PBO, WRITE)
    Main-->>Worker: async_loader_provide_pbo(pbo_ptr)

    Note over Worker: State: CONVERTING
    Worker->>Worker: SIMD float32 → float16 conversion
    Note over Worker: ~2ms for 2048×1024

    Worker-->>Main: State: READY
    Main->>GPU: glUnmapBuffer(PBO)
    Main->>GPU: glTexSubImage2D(from PBO)
    Note over GPU: DMA transfer: PBO → VRAM
    Main->>GPU: glGenerateMipmap()

6.2 — Transition State Machine

During the first load, the screen stays black (no crossfade from a previous scene):

stateDiagram-v2
    [*] --> WAIT_IBL: "First load"
    WAIT_IBL --> WAIT_IBL: "IBL in progress..."
    WAIT_IBL --> FADE_IN: "IBL complete"
    FADE_IN --> IDLE: "Alpha reaches 0"

WAIT_IBL: transition_alpha = 1.0 (fully opaque black) — the screen is black during the first few frames.

FADE_IN: Alpha decreases from 1.0 → 0.0 over 250ms.


Chapter 7 — IBL Generation (Progressive, Multi-Frame)

Once the HDR texture is uploaded, the IBL Coordinator (src/ibl_coordinator.c) takes over. It computes three maps across multiple frames to avoid GPU stalls.

7.1 — The Three IBL Maps

graph TB
    HDR("HDR Environment Map
2048×1024 equirectangular
GL_RGBA16F") --> SPEC HDR --> IRR HDR --> LUM SPEC("Specular Prefilter Map
1024×1024 × 5 mip levels
Compute: spmap.glsl") IRR("Irradiance Map
64×64
Compute: irmap.glsl") LUM("Luminance Reduction
1×1 average
Compute: luminance_reduce") SPEC -->|"Per-pixel reflection
roughness → mip level"| PBR("PBR Shader") IRR -->|"Diffuse hemisphere
integral"| PBR LUM -->|"Auto exposure
threshold"| PP("Post-Process") classDef keyFunc fill:#fff59d,stroke:#f9a825,stroke-width:2px,color:#2d2d2d classDef compute fill:#f3e5f5,stroke:#ab47bc,stroke-width:1.5px,color:#2d2d2d classDef target fill:#e3f2fd,stroke:#42a5f5,stroke-width:1.5px,color:#2d2d2d class HDR keyFunc class SPEC,IRR,LUM compute class PBR,PP target
Map Resolution Format Mip Levels Compute Shader
Specular Prefilter 1024×1024 GL_RGBA16F 5 IBL/spmap.glsl
Irradiance 64×64 GL_RGBA16F 1 IBL/irmap.glsl
Luminance 1×1 GL_R32F 1 IBL/luminance_reduce_pass1/2.glsl

7.2 — Progressive Slicing Strategy

To avoid frame spikes, each mip level is subdivided into slices processed over consecutive frames:

IBL Stage Hardware GPU Software GPU (llvmpipe)
Specular Mip 0 (1024²) 24 slices (42 rows each) 1 slice (full)
Specular Mip 1 (512²) 8 slices 1 slice
Specular Mips 2–4 Grouped (1 dispatch) 1 slice
Irradiance (64²) 12 slices 1 slice
Luminance 2 dispatches (pass 1 + 2) 2 dispatches
gantt
    title Progressive IBL Generation Timeline
    dateFormat x
    axisFormat Frame %s

    section Luminance
    Luminance Pass 1       :lum1, 0, 1000
    Luminance Wait (fence) :lum2, 1000, 2000
    Luminance Readback     :lum3, 2000, 3000

    section Specular Mip 0
    Slice 1/24             :s1, 3000, 4000
    Slice 2/24             :s2, 4000, 5000
    Slice ...              :s3, 5000, 6000
    Slice 24/24            :s4, 6000, 7000

    section Specular Mip 1
    Slices 1-8             :m1, 7000, 9000

    section Specular Mips 2-4
    Grouped dispatch       :m3, 9000, 10000

    section Irradiance
    Slices 1-12            :i1, 10000, 12000

    section Done
    IBL Complete → Fade In :ibl_done, 12000, 13000

7.3 — IBL State Machine

enum IBLState {
    IBL_STATE_IDLE,             // No work
    IBL_STATE_LUMINANCE,        // Pass 1: luminance reduction
    IBL_STATE_LUMINANCE_WAIT,   // Wait for readback fence
    IBL_STATE_SPECULAR_INIT,    // Allocate specular texture
    IBL_STATE_SPECULAR_MIPS,    // Progressive mip generation
    IBL_STATE_IRRADIANCE,       // Progressive irradiance convolution
    IBL_STATE_DONE              // All maps ready
};

Chapter 8 — The Main Loop

app_run() (src/app.c) is the heartbeat — a classic uncapped game loop with fixed-timestep physics.

graph TD
    A("① glfwPollEvents() — Keyboard, mouse, resize")
    A --> B("② Time & FPS — delta_time, frame_count")
    B --> C("③ Camera Physics — Fixed 60Hz, smooth rotation lerp")
    C --> D("④ Geometry Update — if subdivisions changed")
    D --> E("⑤ app_update() — Process input state")
    E --> F("⑥ renderer_draw_frame() — THE BIG ONE")
    F --> G("⑦ Tracy screenshots — profiling")
    G --> H("⑧ glfwSwapBuffers() — Present to screen")
    H -->|"next frame"| A

    classDef keyFunc fill:#fff59d,stroke:#f9a825,stroke-width:2px,color:#2d2d2d
    classDef loopNode fill:#e3f2fd,stroke:#42a5f5,stroke-width:1.5px,color:#2d2d2d
    classDef subsystem fill:#ffffff,stroke:#aaaaaa,stroke-width:1.5px,color:#444444

    class F keyFunc
    class A,B,C,D,E,G loopNode
    class H subsystem

8.1 — Deferred Resize

Window resize events are deferred — the GLFW callback only records the new dimensions. The actual FBO recreation happens at the start of the next frame, outside the callback's limited context.

8.2 — Camera Fixed-Timestep Integration

app->camera.physics_accumulator += (float)app->delta_time;
while (app->camera.physics_accumulator >= app->camera.fixed_timestep) {
    camera_fixed_update(&app->camera);  // Velocity, friction, bobbing
    app->camera.physics_accumulator -= app->camera.fixed_timestep;
}

// Smooth rotation (exponential interpolation)
float alpha = app->camera.rotation_smoothing;  // ~0.1
app->camera.yaw   += (app->camera.yaw_target   - app->camera.yaw)   * alpha;
app->camera.pitch += (app->camera.pitch_target - app->camera.pitch) * alpha;
camera_update_vectors(&app->camera);

This ensures deterministic physics regardless of frame rate, while rotation stays smooth via per-frame interpolation.


Chapter 9 — Rendering a Frame

renderer_draw_frame() (src/renderer.c) orchestrates the full rendering pipeline.

9.1 — High-Level Architecture

graph TD
    A("GPU Profiler Begin") --> B("postprocess_begin() — Bind Scene FBO, Clear")
    B --> C("camera_get_view_matrix()")
    C --> D("glm_perspective() — FOV=60°, near=0.1, far=1000")
    D --> E("ViewProj = Proj × View")
    E --> G1("🌅 Pass 1: Skybox — depth disabled")
    G1 --> G2("🔢 Pass 2: Sphere Sorting — GPU Bitonic")
    G2 --> G3("🔮 Pass 3: PBR Spheres — instanced billboard draw")
    G3 --> H("✨ postprocess_end() — 7-Stage Pipeline")
    H --> I("🖥️ UI Overlay + Env Transition")

    classDef keyFunc fill:#fff59d,stroke:#f9a825,stroke-width:2px,color:#2d2d2d
    classDef loopNode fill:#e3f2fd,stroke:#42a5f5,stroke-width:1.5px,color:#2d2d2d
    classDef setup fill:#ffffff,stroke:#aaaaaa,stroke-width:1.5px,color:#444444
    classDef postfx fill:#fce4ec,stroke:#c2185b,stroke-width:1.5px,color:#2d2d2d

    class G1,G2,G3 keyFunc
    class A,B,C,D,E setup
    class H,I postfx

9.2 — Pass 1: Skybox

The skybox is drawn first, with depth testing disabled. It uses a fullscreen quad trick:

// background.vert — reconstruct world-space ray
gl_Position = vec4(in_position.xy, 1.0, 1.0);  // Depth = 1.0 (far plane)
vec4 pos = m_inv_view_proj * vec4(in_position.xy, 1.0, 1.0);
RayDir = pos.xyz / pos.w;  // Reconstructed world-space ray
// background.frag — equirectangular sampling of the HDR
vec2 uv = SampleEquirectangular(normalize(RayDir));
vec3 envColor = textureLod(environmentMap, uv, blur_lod).rgb;
envColor = clamp(envColor, vec3(0.0), vec3(200.0));  // NaN protection + anti-fireflies
FragColor = vec4(envColor, luma);  // Alpha = luma for FXAA
VelocityOut = vec2(0.0);          // No motion for skybox

9.3 — Pass 2: Sphere Sorting (GPU Bitonic Sort)

For transparent billboard rendering, spheres must be drawn back-to-front:

Mode Where Algorithm Complexity
CPU_QSORT CPU qsort() (stdlib) O(n·log n) avg
CPU_RADIX CPU Radix sort O(n·k)
GPU_BITONIC GPU Bitonic merge sort (compute) O(n·log²n)

9.4 — Pass 3: PBR Spheres — Billboard Ray-Tracing

This is the core rendering pass. A single draw call renders all 100 spheres:

Metric Value
Vertices per sphere 4 (billboard quad)
Triangles per sphere 2 (triangle strip)
Instances 100 (10×10 grid)
Total vertices 400
Draw calls 1
Sphere precision Mathematically perfect (ray-traced)

9.5 — The Billboard Fragment Shader (Ray-Sphere Intersection)

The fragment shader (pbr_ibl_billboard.frag) is where the real magic happens. Instead of shading a rasterized mesh, it analytically intersects a ray with a perfect sphere:

Ray-sphere intersection — the geometric principle

Analytical ray-sphere intersection: the discriminant determines whether the pixel hits the sphere.

// Analytical ray-sphere intersection
vec3 oc = rayOrigin - center;
float b = dot(oc, rayDir);
float c = dot(oc, oc) - radius * radius;
float discriminant = b * b - c;  // >0 = hit, <0 = miss
if (discriminant < 0.0) discard;
float t = -b - sqrt(discriminant);  // nearest intersection
vec3 hitPos = rayOrigin + t * rayDir;
vec3 N = normalize(hitPos - center);  // perfect analytic normal
graph TD
    R("🔦 Build Ray — origin=camPos, dir=normalize(WorldPos-camPos)")
    R --> INT("📐 Ray-Sphere Intersection — discriminant = b² - c")
    INT --> HIT{"Hit?"}
    HIT -->|"No — disc < 0"| DISCARD("❌ discard — pixel outside sphere")
    HIT -->|"Yes"| HITPOS("✅ hitPos = origin + t × dir")
    HITPOS --> NORMAL("N = normalize(hitPos - center) — perfect normal")
    HITPOS --> DEPTH("gl_FragDepth = project(hitPos) — correct Z-buffer")
    NORMAL --> PBR("V = -rayDir")
    PBR --> FRESNEL("Fresnel-Schlick")
    PBR --> GGX("Smith-GGX Geometry")
    PBR --> NDF("GGX NDF Distribution")
    FRESNEL & GGX & NDF --> SPEC("IBL Specular — prefilterMap × brdfLUT")
    PBR --> DIFF("IBL Diffuse — irradiance(N) × albedo")
    SPEC & DIFF --> FINAL("color = Diffuse + Specular")
    FINAL --> AA("Edge Anti-Aliasing — smoothstep on discriminant")
    AA --> ALPHA("FragColor = vec4(color, edgeFactor) — premultiplied alpha")

    classDef keyFunc fill:#fff59d,stroke:#f9a825,stroke-width:2px,color:#2d2d2d
    classDef compute fill:#e3f2fd,stroke:#42a5f5,stroke-width:1.5px,color:#2d2d2d
    classDef entryExit fill:#fce4ec,stroke:#c2185b,stroke-width:2px,color:#2d2d2d
    classDef subsystem fill:#ffffff,stroke:#aaaaaa,stroke-width:1.5px,color:#444444

    class R,INT,HIT keyFunc
    class HITPOS,NORMAL,DEPTH,PBR compute
    class FRESNEL,GGX,NDF,SPEC,DIFF subsystem
    class FINAL,AA,ALPHA,DISCARD entryExit

Analytical Edge Anti-Aliasing

Analytical anti-aliasing of spheres — smoothstep on the discriminant

Analytical anti-aliasing uses the discriminant as a distance-to-edge metric — no MSAA needed for smooth edges.

float pixelSizeWorld = (2.0 * clipW) / (proj[1][1] * screenHeight);
float edgeFactor = smoothstep(0.0, 1.0, discriminant / (2.0 * radius * pixelSizeWorld));
FragColor = vec4(color * edgeFactor, edgeFactor);  // premultiplied alpha

Detail of perfect sphere anti-aliasing

Close-up detail: sphere edges are perfectly smooth thanks to analytical ray-tracing.

Billboard Projection

Sphere AABB projection optimization

The vertex shader computes a tight screen-space quad via analytical tangent-line projection, handling 3 cases: camera outside, inside, or behind the sphere.


Chapter 10 — Post-Processing Pipeline

After the 3D scene is rendered into the MRT FBO, postprocess_end() applies up to 8 effects in a carefully ordered pipeline.

10.1 — The 7-Stage Pipeline

graph TD
    A("Memory Barrier — flush MRT writes")
    A --> B("① Bloom — Downsample → Threshold → Upsample")
    B --> C("② Depth of Field — CoC → Bokeh blur")
    C --> D("③ Auto-Exposure — Luminance reduction → PBO readback")
    D --> E("④ Motion Blur — Tile-max velocity → Neighbor-max")
    E --> F("⑤ Bind 9 Textures + Upload UBO")
    F --> H("Draw fullscreen quad")
    H --> J("Vignette")
    J --> K("Film Grain")
    K --> L("White Balance")
    L --> M("Color Grading — Sat, Contrast, Gamma, Gain")
    M --> N("Tonemapping — filmic curve")
    N --> O("3D LUT Grading")
    O --> P("FXAA")
    P --> Q("Dithering — anti-banding")
    Q --> R("Atmospheric Fog")

    classDef keyFunc fill:#fff59d,stroke:#f9a825,stroke-width:2px,color:#2d2d2d
    classDef compute fill:#f3e5f5,stroke:#ab47bc,stroke-width:1.5px,color:#2d2d2d
    classDef shader fill:#e3f2fd,stroke:#42a5f5,stroke-width:1.5px,color:#2d2d2d
    classDef subsystem fill:#ffffff,stroke:#aaaaaa,stroke-width:1.5px,color:#444444

    class A,F,H subsystem
    class B,C,D,E compute
    class J,K,L,M,N,O,P,Q,R shader

10.2 — Post-Processing Effects Gallery

Here is the front view render with different effects enabled individually — each image shows a single effect applied to the same scene:

No post-processing (raw)

Raw render without any post-processing

Raw image from the PBR renderer — no post-processing applied.

FXAA (fast anti-aliasing)

Render with FXAA enabled

FXAA (Fast Approximate Anti-Aliasing) — smooths edges without the cost of MSAA.

Bloom

Render with Bloom enabled

Bloom — bright areas bleed outward, simulating lens light diffusion.

Depth of Field

Render with Depth of Field enabled

Depth of Field — objects out of focus are blurred like with a real lens.

Auto-Exposure

Render with Auto-Exposure enabled

Auto-exposure — the engine adapts exposure like the human eye adjusting to brightness.

Motion Blur

Render with Motion Blur enabled

Per-pixel motion blur — uses velocity vectors to simulate cinematic motion blur.

Sony A7S III Cinematic Profile

Render with Sony A7S III profile

Full Sony A7S III photographic profile — color grading, white balance, exposure, and 3D LUT combined for a cinematic look.

10.3 — Shader Optimization via Conditional Compilation

The post-process fragment shader uses compile-time #defines to eliminate branches:

#ifdef OPT_ENABLE_BLOOM
    color += bloomTexture * bloomIntensity;
#endif

#ifdef OPT_ENABLE_FXAA
    color = fxaa(color, uv, texelSize);
#endif

A 32-entry LRU cache stores compiled shader variants for different effect flag combinations. Switching effects triggers lazy recompilation only for new combinations.

10.4 — Tonemapping Curves

Tonemapping curves comparison

Comparison of available tonemapping curves — the transformation from linear HDR to displayable LDR.

10.5 — Exposure Adaptation

Exposure adaptation over time

Auto-exposure progressively adapts frame brightness, like the eye's iris adjusting to light.


Chapter 11 — The First Visible Frame

Let's trace what actually appears on screen during the first seconds:

Startup Timeline

Frames What Happens On Screen
1–2 Async loader reads env.hdr from disk Black screen (transition_alpha = 1.0)
3–4 PBO → GPU texture transfer (DMA) + mipmap generation Black screen
5–15 Progressive IBL computation (luminance, specular, irradiance) Black screen (but spheres are rendered into FBO)
~16 IBL complete → TRANSITION_FADE_IN Fade-in begins
~20+ Transition complete — steady state Fully-lit PBR scene

Steady-State Frame

Step Detail Time
1. Poll Events glfwPollEvents() ~0.1ms CPU
2. Camera Update 60Hz physics + rotation lerp ~0.01ms CPU
3a. Skybox Fullscreen quad, equirect. sampling ~0.2ms GPU
3b. Bitonic Sort Compute shader, 100 spheres ~0.1ms GPU
3c. Billboard Spheres 100 ray-traced quads, 1 draw call ~0.5ms GPU
4a. Bloom Downsample → Upsample (if enabled) ~0.3ms GPU
4b. DoF CoC → Bokeh blur (if enabled) ~0.2ms GPU
4c. Auto-Exposure Luminance reduction ~0.1ms GPU
4d. Motion Blur Tile-max velocity (if enabled) ~0.2ms GPU
4e. Final Composite 9 textures, UBO, fullscreen quad ~0.3ms GPU
5. UI Overlay Text + profiler + transition ~0.1ms GPU
6. SwapBuffers Present to screen (wait)
Typical frame time 1–3ms GPU

Chapter 12 — GPU Memory Budget

Here's an estimate of VRAM consumption at steady state:

Textures

Resource Resolution Format Size
HDR Environment 2048×1024 GL_RGBA16F ~16 MB (with mips)
Specular Prefilter 1024² × 5 mips GL_RGBA16F ~10.5 MB
Irradiance 64×64 GL_RGBA16F ~32 KB
BRDF LUT 512×512 GL_RG16F ~1 MB
Scene Color (FBO) 1920×1080 GL_RGBA16F ~16 MB
Velocity (FBO) 1920×1080 GL_RG16F ~8 MB
Depth/Stencil (FBO) 1920×1080 GL_DEPTH32F_STENCIL8 ~10 MB
Bloom chain (6 mips) Various GL_RGBA16F ~21 MB
DoF blur 1920×1080 GL_RGBA16F ~16 MB
Auto-Exposure 64×64 → 1×1 GL_R32F ~16 KB
SH Probes (7 tex) 21×21×3 GL_RGBA16F ~74 KB

Buffers

Resource Count Size Each Total
Billboard quad VBO 4 verts 12 B (vec3) 48 B
Instance VBO 100 instances ~88 B ~8.6 KB
Sort SSBO 100 entries 8 B ~800 B
Fullscreen quad VBO 6 verts 20 B 120 B
UBO (post-process) 1 ~256 B 256 B

Total Estimate

Category Approximate
Textures ~99 MB
Buffers ~40 KB
Shaders (compiled) ~2 MB
Total ~101 MB VRAM

💡 Dominant Cost: The HDR environment map + bloom chain + scene FBOs dominate VRAM usage. The geometry itself (100 billboard quads × 4 vertices in default mode) is negligible — the real sphere computation happens in the fragment shader via ray-tracing.


The engine supports automated captures from different camera angles, used for visual regression testing:

Front view
Front
Left view
Left
Right view
Right
Top view
Top
Bottom view
Bottom
Sony A7S III Profile
Sony A7S III

Full Data Flow Pipeline

graph TD
    POLL("① CPU — glfwPollEvents()") --> TIME("② CPU — Δt calculation")
    TIME --> CAM("③ CPU — Camera physics 60Hz")
    CAM --> SORT("④ CPU → GPU — Sphere sorting")
    SORT --> FBO("⑤ GPU — Bind Scene FBO, Clear")
    FBO --> SKY("🌅 Skybox Pass — Equirectangular sampling")
    SKY --> SPHERES("🔮 Billboard Pass — 1 draw call, 100 instances, ray-tracing")
    SPHERES --> BLOOM("✨ Bloom + DoF + Auto-Exposure + Motion Blur")
    BLOOM --> COMP("🎬 Final Composite — 9 textures, UBO, fullscreen quad")
    COMP --> UI("🖥️ UI Overlay + Profiler + Transition")
    UI --> SWAP("⑩ glfwSwapBuffers()")

    classDef cpu fill:#e3f2fd,stroke:#42a5f5,stroke-width:1.5px,color:#2d2d2d
    classDef gpu fill:#fff59d,stroke:#f9a825,stroke-width:2px,color:#2d2d2d
    classDef postfx fill:#f3e5f5,stroke:#ab47bc,stroke-width:1.5px,color:#2d2d2d
    classDef subsystem fill:#ffffff,stroke:#aaaaaa,stroke-width:1.5px,color:#444444

    class POLL,TIME,CAM,SORT cpu
    class FBO,SKY,SPHERES gpu
    class BLOOM,COMP postfx
    class UI,SWAP subsystem

Glossary

Quick reference for technical terms used in this article, with links to official documentation.

Languages, APIs & Standards

Term Description Link
C11 2011 revision of the C language standard, used for the entire engine cppreference — C11
OpenGL 4.4 Low-level graphics API for communicating with the GPU OpenGL 4.4 Spec (Khronos)
Core Profile OpenGL mode that removes deprecated functions (fixed-function pipeline) OpenGL Wiki — Core Profile
GLSL OpenGL Shading Language — the language for GPU programs (shaders) GLSL Spec (Khronos)
GLFW C library for creating windows and handling keyboard/mouse input glfw.org
GLAD OpenGL loader generator — resolves GL function addresses at runtime GLAD Generator
GLX X11 extension bridging the Window System and OpenGL on Linux GLX Spec (Khronos)
X11 Historic Linux windowing system (display server) X.Org
Mesa Open-source implementation of graphics APIs (OpenGL, Vulkan) on Linux mesa3d.org

3D Rendering — Core Concepts

Term Description Link
PBR Physically-Based Rendering — lighting model that simulates real physics of light learnopengl.com — PBR Theory
IBL Image-Based Lighting — lighting extracted from a panoramic HDR environment image learnopengl.com — IBL
HDR High Dynamic Range — color values exceeding 1.0 (realistic light intensities) learnopengl.com — HDR
LDR Low Dynamic Range — color values 0–255, what the screen actually displays learnopengl.com — HDR
Shader Program executed directly on the GPU (vertex, fragment, compute) OpenGL Wiki — Shader
Vertex Shader Shader that processes each geometry vertex (position, projection) OpenGL Wiki — Vertex Shader
Fragment Shader Shader that computes the color of each on-screen pixel OpenGL Wiki — Fragment Shader
Compute Shader General-purpose GPU shader outside the rendering pipeline OpenGL Wiki — Compute Shader
Skybox Panoramic image displayed as scene background (sky/environment) learnopengl.com — Cubemaps
Rasterization Process of converting 3D triangles into 2D pixels on screen OpenGL Wiki — Rasterization
Draw Call A CPU→GPU call requesting geometry rendering OpenGL Wiki — Rendering Pipeline
Instanced Rendering Technique to draw N copies of an object in a single draw call OpenGL Wiki — Instancing
Mipmap Pre-reduced versions of a texture (½, ¼, ⅛…) for cleaner filtering at distance OpenGL Wiki — Texture#Mip_maps

OpenGL GPU Objects

Term Description Link
FBO Framebuffer Object — offscreen render surface (draw to it instead of the screen) OpenGL Wiki — Framebuffer Object
MRT Multiple Render Targets — write to several textures in a single render pass OpenGL Wiki — MRT
VAO Vertex Array Object — describes the format of geometric data sent to the GPU OpenGL Wiki — VAO
VBO Vertex Buffer Object — GPU buffer containing vertex positions, normals, etc. OpenGL Wiki — VBO
SSBO Shader Storage Buffer Object — read/write GPU buffer accessible from shaders OpenGL Wiki — SSBO
UBO Uniform Buffer Object — data block shared between CPU and shaders OpenGL Wiki — UBO
PBO Pixel Buffer Object — buffer for asynchronous CPU↔GPU pixel transfers OpenGL Wiki — PBO
Texture View Alternate view of an existing texture's data (different format or layers) OpenGL Wiki — Texture View

Ray-Tracing & Geometry

Term Description Link
Ray-Tracing Technique that traces light rays to compute intersections with objects Scratchapixel — Ray-Sphere
Billboard Screen-facing quad used here as a ray-tracing surface OpenGL Wiki — Billboard
AABB Axis-Aligned Bounding Box — axis-aligned enclosing box for fast culling Wikipedia — AABB
Icosphere Sphere built by subdividing an icosahedron (20 faces) — more uniform than a UV sphere Wikipedia — Icosphere
Discriminant Mathematical value (b²−c) determining whether a ray hits a sphere Scratchapixel — Ray-Sphere
Normal Vector perpendicular to the surface at a point — determines surface orientation learnopengl.com — Basic Lighting
Tessellation Subdividing geometry into finer triangles for more detail OpenGL Wiki — Tessellation
Mesh Collection of triangles forming a 3D object Wikipedia — Polygon mesh
Quad Rectangle made of 2 triangles — the basic 2D primitive learnopengl.com — Hello Triangle

PBR & Lighting

Term Description Link
BRDF Bidirectional Reflectance Distribution Function — describes how light bounces off a surface learnopengl.com — PBR Theory
BRDF LUT Pre-computed texture encoding the BRDF integral for all (angle, roughness) combinations learnopengl.com — Specular IBL
Fresnel-Schlick Approximation of the Fresnel effect: surfaces reflect more at grazing angles learnopengl.com — PBR Theory
GGX / Smith-GGX Microfacet model for geometry and normal distribution (roughness) learnopengl.com — PBR Theory
NDF Normal Distribution Function — statistical distribution of microfacet orientations learnopengl.com — PBR Theory
Albedo Base color of a material (without lighting) learnopengl.com — PBR Theory
Metallic PBR parameter: 0 = dielectric (plastic, wood), 1 = metal (gold, chrome) learnopengl.com — PBR Theory
Roughness PBR parameter: 0 = perfect mirror, 1 = completely matte learnopengl.com — PBR Theory
AO Ambient Occlusion — darkens crevices and corners (ambient light occlusion) learnopengl.com — SSAO
Dielectric Non-metallic material (plastic, glass, wood) — reflects little at direct angles learnopengl.com — PBR Theory
Irradiance Map Texture encoding hemisphere-integrated diffuse ambient light for each direction learnopengl.com — Diffuse Irradiance
Specular Prefilter Mip-mapped texture encoding blurred reflections by roughness level learnopengl.com — Specular IBL
Equirectangular 2D projection of a sphere (like a world map) — the format of .hdr images Wikipedia — Equirectangular
SH Probes Spherical Harmonics — compact representation of a low-frequency light field Wikipedia — SH Lighting

Post-Processing

Term Description Link
Bloom Glow halo around very bright areas (lens light diffusion) learnopengl.com — Bloom
Depth of Field (DoF) Blur of objects outside the focus distance Wikipedia — Depth of field
CoC Circle of Confusion — blur disc diameter for an out-of-focus point Wikipedia — CoC
Bokeh Aesthetic shape of background blur (discs, hexagons…) Wikipedia — Bokeh
Motion Blur Per-pixel motion blur simulating a camera shutter GPU Gems — Motion Blur
FXAA Fast Approximate Anti-Aliasing — fast post-process anti-aliasing on the final image NVIDIA — FXAA
MSAA Multisample Anti-Aliasing — geometric anti-aliasing (expensive, avoided here) OpenGL Wiki — Multisampling
Tonemapping Conversion of HDR colors (unbounded) to displayable LDR (0–255) learnopengl.com — HDR
Color Grading Creative color adjustments (saturation, contrast, gamma, hue) Wikipedia — Color grading
3D LUT 3D color lookup table for a cinematic "look" (.cube file) Wikipedia — 3D LUT
Vignette Progressive darkening of image edges (lens effect) Wikipedia — Vignetting
Dithering Adding imperceptible noise to break banding artifacts in gradients Wikipedia — Dither
Auto-Exposure Automatic scene brightness adaptation (simulates the eye's iris) learnopengl.com — HDR
VSync Vertical Sync — syncs rendering with screen refresh (prevents tearing) Wikipedia — VSync

Architecture & Performance

Term Description Link
SIMD Single Instruction, Multiple Data — vector computation (1 instruction processes 4+ values) Wikipedia — SIMD
SSE Intel/AMD SIMD extensions for x86 (128-bit registers) Intel — SSE Intrinsics
NEON ARM SIMD extensions (smartphones, Apple Silicon, Raspberry Pi) ARM — NEON
VRAM Dedicated GPU memory — where textures and buffers reside Wikipedia — VRAM
DMA Direct Memory Access — data transfer without CPU involvement Wikipedia — DMA
Cache-friendly Memory layout that minimizes CPU cache misses (contiguous data) Wikipedia — Cache
LRU Cache Least Recently Used — cache that evicts the least recently used entry Wikipedia — LRU
Fence (GLSync) GPU sync object — lets you wait for GPU work to complete OpenGL Wiki — Sync Object
Memory Barrier GPU instruction ensuring previous writes are visible before subsequent reads OpenGL Wiki — Memory Barrier
Work Group Group of GPU threads executed together in a compute shader (e.g. 16×16 = 256 threads) OpenGL Wiki — Compute Shader
Dispatch CPU call that launches a compute shader on the GPU OpenGL Wiki — Compute Shader

Mathematics & Camera

Term Description Link
mat4 / vec3 4×4 matrix and 3D vector — fundamental 3D types cglm docs
FOV Field of View — camera viewing angle (60° here) Wikipedia — FOV
View Matrix Transforms world coordinates into camera coordinates (lookAt) learnopengl.com — Camera
Projection Matrix Transforms 3D to 2D with perspective (far objects = smaller) learnopengl.com — Coordinate Systems
Yaw / Pitch Yaw = left-right rotation, Pitch = up-down rotation of the camera learnopengl.com — Camera
Lerp Linear Interpolation — smooth transition between two values: a + t × (b − a) Wikipedia — Lerp
EMA Exponential Moving Average — weighted average favoring recent values Wikipedia — EMA
Smoothstep S-curve interpolation function (smooth transition between 0 and 1) Khronos — smoothstep
Z-buffer / Depth Buffer Texture storing per-pixel depth to handle occlusion learnopengl.com — Depth Testing
Stencil Buffer Per-pixel mask restricting rendering to specific areas learnopengl.com — Stencil

Sorting Algorithms

Term Description Link
Bitonic Sort Parallel sort suited for GPUs — compares and swaps in pairs Wikipedia — Bitonic sort
Radix Sort Sort by successive digits — O(n·k), efficient on CPU for integer keys Wikipedia — Radix sort
Back-to-front Rendering order from farthest to nearest, required for correct transparency Wikipedia — Painter's algorithm

Multithreading

Term Description Link
POSIX Threads Standard threading API on Unix/Linux (pthread_create, pthread_cond_wait) man — pthreads
Async Loading Running disk I/O on a separate thread to avoid blocking the render loop Wikipedia — Async I/O
Condition Variable Sync mechanism: a thread sleeps until another signals it man — pthread_cond_wait

Miscellaneous

Term Description Link
Tracy Real-time profiler for games and graphics apps (per-frame CPU + GPU measurement) Tracy Profiler (GitHub)
cglm SIMD-optimized C math library for 3D (matrices, vectors, quaternions) cglm (GitHub)
stb_image Single-header C library for loading images (PNG, JPEG, HDR…) stb (GitHub)
Fixed Timestep Physics update at a constant interval (e.g. 60 Hz) regardless of framerate Fix Your Timestep! (Fiedler)
Game Loop Main application loop: read input → update → render → repeat Game Programming Patterns — Game Loop
Fireflies Ultra-bright aberrant pixels caused by extreme HDR values (artifact) Physically Based — Fireflies
Premultiplied Alpha Convention where RGB is already multiplied by alpha — enables correct blending Wikipedia — Premultiplied alpha

Conclusion

suckless-ogl demonstrates that a complete PBR engine can be built with readable C11 code, a clear rendering pipeline, and GPU performance measured in milliseconds. The design choices — billboard ray-tracing instead of meshes, async HDR loading, progressive IBL, modular post-processing — show how to solve real graphics problems with elegance.

The full source code is available on GitHub, and the detailed technical documentation at yoyonel.github.io/suckless-ogl.

In upcoming articles, we'll explore the Vulkan and NVRHI projects that push these concepts even further.