Shader Cross-GPU Compatibility Guidelines¶

Case Study: See gpu-rendering-synchronization.md for a real-world example of Intel vs NVIDIA fixes

Overview¶

Best practices for writing OpenGL shaders that produce consistent results across GPU vendors (Intel, NVIDIA, AMD).

Key Principles¶

1. Avoid Relying on Derivative Precision¶

Problem: dFdx() and dFdy() have vendor-specific implementations.

Recommendation: Use derivatives only for debugging/visualization, not critical rendering logic.

Alternative: Pre-compute values in geometry shaders or use distance-based heuristics.

2. Pre-Calculate and Store Values¶

Problem: Recalculating the same value multiple times can accumulate precision errors differently.

Solution: Calculate once, store in unused texture channels (e.g., alpha).

Example:

// Fragment shader (once)
float luma_val = dot(sqrt(color_val), vec3(0.2126, 0.7152, 0.0722));
FragColor_out = vec4(color_val, luma_val);  // Store in alpha

// Post-processing (reuse)
float stored_luma = texture(tex_sampler, uv_coords).a;  // Consistent across vendors

3. Use Explicit Precision Qualifiers (Mobile/WebGL)¶

#ifdef GL_ES
precision highp float;
precision highp int;
#endif

Note: Desktop OpenGL ignores these, but they're critical for mobile/WebGL.

4. Clamp Intermediate Values¶

Prevent overflow/underflow in calculations:

// Bad
float val_bad = pow(someInput, 0.1);

// Good
float val_good = pow(clamp(someInput, 0.0, 10.0), 0.1);

5. Avoid Fast Math Assumptions¶

Use explicit parentheses to control floating-point operation order:

// Order-dependent
float result_val = ((a * b) + (c * d)) + (e * f);

Common Pitfalls¶

Pitfall 1: Texture LOD Calculation¶

// Implicit LOD may differ
vec3 color_res = texture(envMap_tex, direction).rgb;

// Explicit LOD is consistent
vec3 color_res_fixed = textureLod(envMap_tex, direction, roughness * 4.0).rgb;

Pitfall 2: Small Exponents in pow()¶

// Very sensitive to input differences
float bad_pow = pow(variation, 0.1);   // 10th root

// More stable
float better_sqrt = sqrt(variation);     // Square root

Pitfall 3: Derivatives in Divergent Branches¶

// GOOD: Compute before branching
float dx_precomputed = dFdx(value_val);
if (someCondition) {
    // Use dx_precomputed
}

Pitfall 4: Mixed Bitwise Operator Types¶

Problem: Some drivers do not support implicit int -> uint conversions for bitwise operators (&, |, ^).

Recommendation: Always use explicit literals (e.g., 0x1u) or casts when mixing types in bitwise operations.

Example:

// Warning/Error on some drivers
uint result = some_uint & 0x1;

// Portable
uint result = some_uint & 0x1u;
// OR
uint result = some_uint & uint(some_int);

Testing Workflow¶

Visual Comparison: Test on 2+ GPU vendors
Pixel Diff: Use image comparison tools (see Visual Testing Artifacts)
Frame Capture: Compare with RenderDoc/ApiTrace
Driver Versions: Test with different driver releases

Tools¶

# Visual diff
compare intel.png nvidia.png diff.png

# ApiTrace
apitrace trace ./app
apitrace replay app.trace

When to Use Derivatives¶

✅ Safe uses:

Debug visualization (where exact parity is not required)
Non-critical effects (optional grain, etc.)
Explicitly documented vendor-specific behavior

❌ Avoid for:

Core rendering logic where cross-GPU "Difference Maps" must be clean
Anti-aliasing (prefer FXAA or Analytic smoothing)
Material property adjustments (use MIN_ROUGHNESS constants)

Implementation Example: Analytic Edge Smoothing¶

Instead of fwidth(h):

// h = discriminant (0 at edge)
// analyticFwidth = footprint of pixel in h-space
float edgeFactor_val = clamp(h / analyticFwidth, 0.0, 1.0);

See pbr_ibl_billboard.frag for a production example.

More details on why testing artifacts appear in CI can be found in Visual Testing & Regression Artifacts.