Post-Processing UBO Architecture¶
This document details the implementation of the Uniform Buffer Object (UBO) used to manage post-processing parameters in the engine. This transition from individual uniforms (glUniform*) was made to improve CPU performance and code clarity.
1. Overview¶
Instead of sending dozens of uniforms (floats, integers) one by one every frame, we bundle all configuration data (Vignette, Bloom, Exposure, FXAA, etc.) into a single contiguous memory structure.
- C Side:
struct PostProcessUBO(fileinclude/postprocess.h) - GPU Side:
program Blockwithstd140layout (fileshaders/postprocess/ubo.glsl) - Transfer: A single
glBufferSubDatacall per frame.
2. Critical Constraints: std140 Layout¶
The std140 layout imposes strict alignment rules in GPU memory. To ensure the driver reads data correctly, the C structure must mimic this alignment byte-for-byte.
std140 Alignment Rules (Simplified)¶
- Scalars (float, int, bool): Base alignment N (4 bytes).
- Vectors (vec2): Alignment 2N (8 bytes).
- Vectors (vec3, vec4): Alignment 4N (16 bytes).
- Arrays / Structures: Alignment rounded up to 16 bytes (size of a vec4).
- Padding: There is no implicit padding between scalars, unless necessary to respect the alignment of the next member.
The "Array vs Scalar" Padding Trap¶
This is where a subtle error can occur:
- If you declare float padding[3] in GLSL, std140 treats this as an array. Every array element must be aligned to 16 bytes. So padding[0] takes 16 bytes (4 useful + 12 empty), padding[1] takes 16 bytes, etc.
- Solution: Use individual scalar fields for GLSL padding (float _pad1; float _pad2; ...) so they are packed compactly (4 bytes each), matching exactly the packed float padding[3] in C.
Memory Visualization (std140)¶
3. Data Structure¶
C Structure (postprocess.h)¶
We use explicit padding to align logical blocks to 16 bytes, facilitating memory reading and debugging.
typedef struct {
uint32_t active_effects; // 4 bytes
float time; // 4 bytes
float _pad0[2]; // 8 bytes (Padding to reach 16 bytes)
/* Vignette (16 bytes) */
float vignette_intensity;
float vignette_smoothness;
float vignette_roundness;
float _pad1; // 4 bytes
// ... (Grain, Exposure, etc.)
/* FXAA (16 bytes) */
float fxaa_quality_subpix;
float fxaa_quality_edge_threshold;
float fxaa_quality_edge_threshold_min;
float _pad10;
/* 3D LUT (16 bytes) */
float lut3d_intensity;
float _pad11[3];
} PostProcessUBO_Layout;
GLSL Block (ubo.glsl)¶
The block defines std140 layout and binding 0.
IMPORTANT: Padding fields must be declared individually!
LAYOUT_CONFIG(std140, binding = 0) uniform PostProcessBlock_Layout {
uint activeEffects;
float time;
float _pad0_0; // Matches _pad0[0] in C
float _pad0_1; // Matches _pad0[1] in C
/* Vignette */
float v_intensity;
float v_smoothness;
float v_roundness;
float _pad1;
// ... (Grain, Exposure, etc.)
/* FXAA */
float fxaaQualitySubpix;
float fxaaQualityEdgeThreshold;
float fxaaQualityEdgeThresholdMin;
float _pad10;
/* 3D LUT */
float lut3d_intensity;
float _pad11_0;
float _pad11_1;
float _pad11_2;
};
4. Adding a New Parameter¶
To add an effect or parameter, meticulously follow these steps:
-
Add to C struct (
postprocess.h):- Add the field
float my_param. - Adjust the padding array
_padXso the total block size remains a multiple of 16 bytes (optional but recommended for organization).
- Add the field
-
Add to GLSL block (
ubo.glsl):- Add the field
float my_param. - Crucial: Update the scalar padding fields (
_padX_0, etc.) to match the C offsets exactly.
- Add the field
-
Usage in shaders:
- The file
ubo.glslis included via@header "postprocess/ubo.glsl". - Access
my_paramdirectly (the block makes members global). - For boolean flags, add a macro in
ubo.glsl:
- The file
5. Performance¶
Using the UBO allowed removing the following per-frame calls:
- glUseProgram (for uploads)
- ~30-50 calls to glUniform1f / glUniform1i
- The driver validation associated with each call.
Now, postprocess_end() performs:
1. Copy data to local struct (on stack).
2. A single call to glBufferSubData.