Asynchronous Environment Map Loader¶
This document describes the asynchronous loading system implemented to handle heavy HDR environment maps without blocking the main application loop.
Overview¶
Loading high-resolution HDR textures (e.g., 4k or 8k .hdr files) can take several hundred milliseconds or even seconds depending on disk speed. Performing this operation on the main thread causes the entire application to freeze (stop rendering and processing input), leading to a poor user experience.
The Async Loader decouples the Disk I/O and CPU decompression steps from the Main Thread, moving them to a background worker thread.
Architecture¶
The system consists of several main components working together to ensure zero-stall loading:
-
Async Loader Module (
src/async_loader.c)- Manages a background worker thread.
- Handles heavy I/O and SIMD format conversion.
- Implements the
WAITING_FOR_PBOprotocol to interact with the main thread.
-
Async Coordinator (
src/async/async_coordinator.h)- Manages double-buffered PBOs on the main thread.
- Handles the mapping and provision of GPU memory to the worker.
-
Environment Manager (
src/env_manager.c)- Orchestrates the multi-step loading process over several frames.
Data Flow (PBO-based)¶
The loading process follows a specialized "handshake" to avoid shared contexts:
- Main Thread: Calls
async_loader_request(path). - Worker Thread: Loads the file from disk and decodes it into a CPU float buffer. Transitions to
ASYNC_WAITING_FOR_PBO. - Main Thread: Detects the waiting state during
poll(). It maps a PBO and provides the pointer to the worker. - Worker Thread: Converts the float data directly into the mapped PBO memory using SIMD instructions. Transitions to
ASYNC_READY. - Main Thread: Detects
ASYNC_READY. It unmaps the PBO and performs aglTexSubImage2D(DMA transfer) to the GPU texture.
This architecture ensures that the Main Thread never touches the pixel data and the Worker Thread never touches the OpenGL context.
Scheduling Sequence¶
The complexity of the PBO-based approach is handled by spliting the work across several frames:
%%{init: {
"theme": "dark",
"themeVariables": {
"primaryColor": "#24283b",
"primaryTextColor": "#ffffff",
"primaryBorderColor": "#7aa2f7",
"lineColor": "#7aa2f7",
"signalColor": "#ffffff",
"signalTextColor": "#ffffff",
"messageColor": "#ffffff",
"messageTextColor": "#ffffff",
"labelTextColor": "#ffffff",
"actorTextColor": "#ffffff",
"actorBorder": "#7aa2f7",
"actorBkg": "#24283b",
"noteBkgColor": "#e0af68",
"noteTextColor": "#1a1b26"
}
}%%
sequenceDiagram
participant M as Main Thread
participant W as Worker Thread
participant G as GPU / VRAM
Note over M,W: Frame 1
M->>W: Request Load (path)
W->>W: I/O Read + Decode (CPU RAM)
Note over M,W: Frame N (Worker finishes I/O)
W-->>M: State = WAITING_FOR_PBO
M->>M: Map PBO (Unsynchronized)
M->>W: Pass PBO Pointer
Note over M,W: Frame N+1 (Worker does SIMD)
W->>W: SIMD Convert (F32 to F16) into PBO
W-->>M: State = READY
Note over M,W: Frame N+2 (Final Integration)
M->>M: Unmap PBO
M->>G: glTexSubImage2D (Fast DMA)
M->>G: glGenerateMipmap
M->>M: Start Progressive IBL
The 3-Step Integration¶
To further reduce frame spikes, the final GPU integration is split across frames:
- Step 1: Upload: data is moved from PBO to a pre-allocated texture.
- Step 2: Mipmaps:
glGenerateMipmapis called (allocated on first use). - Step 3: IBL Start: The
IBLCoordinatorbegins slicing the map for irradiance and pre-filtering.
Detailed PBO Strategy¶
For more technical details on the PBO implementation, see Async PBO Upload.