Profiling Guide: ApiTrace & Trace Analysis¶

This guide explains how to use ApiTrace in combination with our custom analysis tool to measure GPU performance, with a focus on overcoming driver-specific profiling limitations for compute shaders.

Why use this workflow?¶

Standard GPU profilers (and even apitrace replay --pgpu) often report 0.00ms for glDispatchCompute calls on many drivers/architectures because they rely on driver-internal counters that don't always track compute asynchronously or accurately.

Our engine injects manual GL_TIMESTAMP queries and glPushDebugGroup markers into the command stream. The trace_analyze.py tool correlates these timestamps with the trace dump to provide a "Source of Truth" for compute shaders.

1. Prerequisites¶

Ensure you have ApiTrace installed on your system. The Makefile expects apitrace to be in your PATH, or you can point to a specific binary using APITRACE_BIN.

# Example for a specific location
$ export APITRACE_BIN="/path/to/apitrace"

2. Generating a Trace¶

To record the execution of the application:

make apitrace

This will:

Build the application in "Profile" mode (without high-level debug overhead).
Use apitrace trace to record all OpenGL calls into build-profile/app.trace.

3. High-Level Performance Analysis¶

The easiest way to see the results is to use the integrated target:

make trace-perf

This command runs scripts/trace_analyze.py which produces two tables:

Performance by Shader (Cumulative): Groups all calls by shader name/label.
Debug Groups (Per Instance): Shows the chronological execution of marked blocks (e.g., IBL generation steps).

Understanding the Metrics¶

Column	Description
GPU [ms]	Driver-reported duration (Standard `glDraw*` calls). Use this for Vertex/Fragment shaders.
Timer [ms]	Manual GL_TIMESTAMP markers. Use this for Compute Shaders and IBL generation blocks.
Avg/Fr[ms]	Cumulative GPU time divided by the total number of frames in the trace.

4. Advanced Tool Usage¶

You can run the script manually for more control:

# Usage: python3 scripts/trace_analyze.py <trace_file> [apitrace_bin]
python3 scripts/trace_analyze.py build-profile/app.trace

Script Logic¶

Regex Parsing: The script parses apitrace dump to extract glObjectLabel (to name shaders) and glGetQueryObjectui64v (to get manual timestamps).
Matching: It looks for query result fetches that happen immediately after a debug group ends to measure that group's duration.
Nested Sums: If a parent debug group doesn't have its own timer, the script sums up the durations of its direct children (marked with * in the table).

5. Coding for Profiling¶

To make a new feature measurable:

Label your shaders: Use shader_set_label(shader, "Description") in C.
Add debug groups:

GL_PUSH_GROUP("My Complex Task", 0);
// ... GPU calls ...
GL_POP_GROUP();

Add fine-grained timers (Optional): The engine automatically attempts to capture timestamps around critical IBL sections. See src/pbr.c for examples.

7. Automated Performance Regression Detection (CI/CD)¶

To prevent performance regressions from entering the codebase, we use automated ApiTrace analysis:

Unit-Level (just test-apitrace): Records a trace of the integration suite (test_app) and greps for "performance issue" or "stall" messages.
Full-Integration (just test-integration-apitrace): Records a trace of the main application while simulating user input via xdotool. This is the most comprehensive check, covering environment swaps, post-processing toggles, and UI interactions.

If these tests fail with ❌ Validation failed: Performance issues found in trace, it usually indicates a new GPU stall has been introduced.

8. Developing the Tool¶

The analysis script is fully tested:

Linting: make lint (uses Ruff).
Formatting: make format (uses Ruff).
Tests: make test-python (uses Pytest).
Coverage: make coverage (generates HTML report for Python logic).