Profiling Guide: ApiTrace & Trace Analysis¶
This guide explains how to use ApiTrace in combination with our custom analysis tool to measure GPU performance, with a focus on overcoming driver-specific profiling limitations for compute shaders.
Why use this workflow?¶
Standard GPU profilers (and even apitrace replay --pgpu) often report 0.00ms for glDispatchCompute calls on many drivers/architectures because they rely on driver-internal counters that don't always track compute asynchronously or accurately.
Our engine injects manual GL_TIMESTAMP queries and glPushDebugGroup markers into the command stream. The trace_analyze.py tool correlates these timestamps with the trace dump to provide a "Source of Truth" for compute shaders.
1. Prerequisites¶
Ensure you have ApiTrace installed on your system.
The Makefile expects apitrace to be in your PATH, or you can point to a specific binary using APITRACE_BIN.
2. Generating a Trace¶
To record the execution of the application:
This will:
- Build the application in "Profile" mode (without high-level debug overhead).
- Use
apitrace traceto record all OpenGL calls intobuild-profile/app.trace.
3. High-Level Performance Analysis¶
The easiest way to see the results is to use the integrated target:
This command runs scripts/trace_analyze.py which produces two tables:
- Performance by Shader (Cumulative): Groups all calls by shader name/label.
- Debug Groups (Per Instance): Shows the chronological execution of marked blocks (e.g., IBL generation steps).
Understanding the Metrics¶
| Column | Description |
|---|---|
| GPU [ms] | Driver-reported duration (Standard glDraw* calls). Use this for Vertex/Fragment shaders. |
| Timer [ms] | Manual GL_TIMESTAMP markers. Use this for Compute Shaders and IBL generation blocks. |
| Avg/Fr[ms] | Cumulative GPU time divided by the total number of frames in the trace. |
4. Advanced Tool Usage¶
You can run the script manually for more control:
# Usage: python3 scripts/trace_analyze.py <trace_file> [apitrace_bin]
python3 scripts/trace_analyze.py build-profile/app.trace
Script Logic¶
- Regex Parsing: The script parses
apitrace dumpto extractglObjectLabel(to name shaders) andglGetQueryObjectui64v(to get manual timestamps). - Matching: It looks for query result fetches that happen immediately after a debug group ends to measure that group's duration.
- Nested Sums: If a parent debug group doesn't have its own timer, the script sums up the durations of its direct children (marked with
*in the table).
5. Coding for Profiling¶
To make a new feature measurable:
- Label your shaders: Use
shader_set_label(shader, "Description")in C. - Add debug groups:
- Add fine-grained timers (Optional):
The engine automatically attempts to capture timestamps around critical IBL sections. See
src/pbr.cfor examples.
7. Automated Performance Regression Detection (CI/CD)¶
To prevent performance regressions from entering the codebase, we use automated ApiTrace analysis:
- Unit-Level (
just test-apitrace): Records a trace of the integration suite (test_app) and greps for "performance issue" or "stall" messages. - Full-Integration (
just test-integration-apitrace): Records a trace of the main application while simulating user input viaxdotool. This is the most comprehensive check, covering environment swaps, post-processing toggles, and UI interactions.
If these tests fail with ❌ Validation failed: Performance issues found in trace, it usually indicates a new GPU stall has been introduced.
8. Developing the Tool¶
The analysis script is fully tested:
- Linting:
make lint(uses Ruff). - Formatting:
make format(uses Ruff). - Tests:
make test-python(uses Pytest). - Coverage:
make coverage(generates HTML report for Python logic).