Debugging Vulkan applications, rendering and general GPU state via capture-based works great. The most commonly used open-source graphics debugger is renderdoc: it allows to capture a rendered frame or manually record a portion of commands submitted to the device and then inspect them, allowing to view the used resources as well as the executed commands in all detail and their effect on GPU state.

Additionally, many game engines allow easy and quick introspection by exposing debugging user interfaces that allow to dig into engine state, e.g. showing the contents of specific textures or framebuffers on screen, showing the amount of consumed memory or time needed to execute certain commands on the GPU. For examples see [1], [2]. Having quick access to and visualization of application state via an in-application GUI can make debugging easier and helps to find problems that are otherwise hard to detect.

Let’s combine these two approaches: allow as much graphics introspection live in the application as possible. And since graphics introspection will deal with state on Vulkan API-level, we can do it in an engine-neutral and fully reusable Vulkan layer. A layer that automatically provides introspection of logical CPU state such as layouts or recorded command buffers as well as device state such as buffer and texture contents. A layer that just works without any integration effort for all Vulkan applications, no matter what they are: simple samples and experiments, full blown rendering engines, compute-only applications and all other kinds of programs using Vulkan. And, due to the nature of Vulkan layers, this even applies to already shipped applications, where code-modification isn’t possible at all.

That’s VIL, the Vulkan Introspection Layer (Github).

Current features:

  • See a list of all alive Vulkan resources, such as Images, CommandBuffers, DescriptorSets, Queues, Fences. They can be identified using names given by the application via VK_EXT_debug_utils.
    • See the reosurces’ creation parameters as well as current state.
    • For buffers/images the current content of the resources can be displayed
    • Resources generally link to related resources, e.g. it is possible to see all image views for an image or all resources bind to a certain VkDeviceMemory object.
  • View every command of a recorded command buffer. Or show all commands submitted to the device between two present calls, similar to the per-frame captures in renderdoc.
    • Every command can be inspected in detail, e.g. all its parameters
    • Execution of those command buffers can be inspected: Allows to show the execution time for a selected command, This includes whole render passes or sections created by Vulkan debug labels.
    • Allows viewing all used resources by a transfer/draw/dispatch command as well as their contents. For instance, it is possible to see how a draw command changed a framebuffer attachment, what values uniform buffer holds during a specific draw command or what the state of a storage buffer after a dispatch command ist.
    • For draw commands, it is also possible to capture the vertex state. The data stored in vertex buffers as well as the data output by the vertex shader can be captured and displayed.
  • Show general application info such as used extensions and enabled features
  • Show details regarding memory conumption of the application

And all of this works live. No need to do captures, no need to load them. When you edit a shader and reload a pipeline, the shown state is instantly updated by the device submissions of your application.

While most Vulkan core features and many extensions are supported, there are still some missing. See features.md. VIL already supports acceleration structure viewing, making Vulkan ray tracing debugging easier.

Comparison to capture-based debugging

The nature of live introspection brings advantages and disadvantages in comparison to traditional capture-based GPU debugging.

Analyzing a specific workload submitted to the GPU (such as a single frame) works well with GPU captures but can have significant problems with live introspection: At the time you view the commands, some resources might alrleady have been destroyed. For commands that are only submitted once, you won’t be able to introspect the associated GPU state at all, since the way that VIL captures state relies on commands being submitted again and again. This assumption usually works well for applications that render frames or submit similar compute workload over and over again but might not be true for all usecases. Furthermore, there are several features offered by traditional GPU debuggers that are not present in VIL such as shader debugging or pixel histories. Some useful features are also hard to combine with live nature of VIL and might never be implemented. Also, related to this problem, VIL uses heuristics to find a selected commands in new submissions again. Although this works in most observed applications, these heuristics may break and make debugging hard. VIL just shown the submissions that applications make, and if they vary significantly from frame to frame, there can be a lot of instability in the UI. There are concepts in VIL to “Freeze the shown commands” or “Freeze the shown state” but getting used to there concepts might take a while.

On the other hand, using VIL can be more comfortable when a quick first overview is desired or when debugging would require many consecutive GPU captures anyways. For applications that submit a lot of work, taking a capture, loading that capture and finding the command that should be debugged over and over again can be time-consuming and brings significant memory overhead - this is especially true for modern, low-level graphics APIs such as Vulkan. In heavy applications, I’ve waited minutes for captures to be executed and then minutes again for them to load - and that only if my computer wasn’t out of memory at that point. Furthermore, the nature of capture-based debugging brings problems that VIL does not have. VIL allows to view the state and contents of resources at any time. And it will show the actual state, not state from a replay of the same commands. While this difference is usually not too interesting, it can become significant in the face of synchronization issues. I’ve seen several cases where certain rendering sequences showed problems that were suddenly gone in a replayed capture. In VIL, the state you see in the GUI is always the same that is used while executing your commands. Of course, VIL influences the GPU synchronization of your program as well. But this means that, when you select a problematic command and VIL inserts additional synchronization for state capture that fix synchronization issues your application has, the synchronization are fixed in the submitted commands as well, i.e. rendering will be fixed. While this might not seem significantly better either at first glance, it can be seen as an actually useful debugging feature: Selecting specific commands will insert heavy barriers and so if selecting a command in VIL fixed issues with your rendering/compute workload, it is a sign that your application is missing synchronization. This way of debugging is not recommended though, synchronization in Vulkan can effectively be debugged with the validation layers and synchronization validation.

Another advantage of VIL related to its live nature is the ability to capture real timings. Captures can also show timings but they might not be as accurate (since they are timing the replayed commands). Furthermore, the ability to continuously show the timings in VIL can be useful e.g. to see how the timings of render passes in a game engine change depending on where the camera is looking. The feature is implemented in VIL: just select any command and VIL will query its time using GPU timestamps. As usually with GPU timings, they are not trivial to interpret, keep in mind that measuring the time of a single draw command is not too meaningful due to the pipelined nature of GPUs. But when selecting a vkCmdBeginDebugUtilsLabelEXT or vkCmdBeginRenderPass command, VIL will show the timing of the command and all its children (i.e. commands between the begin and end).

So in general, VIL does not aim to replace capture-based debugging. In my experience, VIL is more confortable for quick insights or to get a general overview than capture based debuggers but for some heavy debugging work - requiring a single, faulty frame and deep digging into the shaders - or for applications that just don’t work well with VILs assumptions, GPU captures are the way to go.

Future improvements

There is a lot to do and still many problems that will be fixed in the next releases:

  • Finish support for core Vulkan up to version 1.3
  • The UI needs a lot of improvements, especially the resource viewer and memory overview.
  • Better viewers of image and buffer content.
  • Improve performance and memory overhead. It’s already acceptable in multiple tested applications (including performance-heavy games from the last years), meaning that applications generally still run smoothly with the layer. But for some applications, there is still noticable overhead from bottlenecks in the layer which can be optimized.

All feedback, ideas, questions and requests are appreciated. Feel free to start discussions at the github page or email me at [email protected].