Journey Through Vulkan
As any Vulkan article starts I will begin with my lament about the complexity and most concepts in Vulkan. I wanted to learn Vulkan and had average OpenGL knowledge. To make the switch it was only necessary to know how to do the following: Select hardware device. Set up, configure application instances and extensions. Set up validation layers, link external debug utility calls, enable debug logs. Inspect which GPU queue families are supported. Create and allocate graphics and presentation queues. Set up command pool, command buffers. Set up window surface integration with winapi. Set up image generation, allocation, layout transitions, and image views. Set up swap chain with associated images, image views, depth images, depth image views. Configure render pass, subpasses, subpass synchronization dependencies, attachments. Generate, allocate frame buffers, link frame buffer to render pass, image views, and swap chain. Allocate, initialize vertex, index, uniform, input buffers. Loading, compiling, setting up shader modules. Configure descriptor pools, descriptor (write) sets, descriptor attributes, descriptor binding, shader input, and output. Configure pipeline with all previous steps. Set up Fences, semaphores, pipeline barriers, for pipeline synchronization. Dispose and destroy allocated memory, recreate the swap chain when a resize happens. And at last, create your render pipeline and record the commands into the command buffer and display them to the screen. Oh and make it compile of course.
If you have this you can render opengl hello world triangle.
Tooling and Libraries
For this project, I decided to use Rust. There are several Rust libraries with Vulkan FFI bindings. The well-known ones are Vulkano and Ash.
Vulkano has a number of ready-made abstractions for Vulkan. Abstractions over shader modules, generating image views for images, abstractions over buffers, swap chains, render passes, and so on. With this API you can get something working relatively quickly. Whilst this might be useful to reduce the development time I experienced the abstractions as a block in my learning process. The reason is that c++ tutorials, documentation, specs, can not be transferred 1-to-1 to Vulkano. Also, I did not like the macro to load shaders. Vulkano generates with macros’ dynamic rust modules. With the result that you cannot refer to these dynamic types in the source code. I found it difficult to integrate into my engine API and decided to inspect more pure Vulkan FFI bindings.
‘Ash’ is another popular bindings library. After a short introduction, I really liked this API. And even better than the CPP library. For example, constants are wrapped in enum types without prefixes/postfixes, collections are passed in by slice instead of a raw pointer, and it provides builder types to easaly construct structs.
Here it uses a builder type, that uses the builder pattern to allow a user to specify the options it needs to change. The builder has default values and makes it so that the user doesn't have to specify the `s_type` , `p_next`, empty pointers, nullable fields, and default values. It's very minimalistic.
Here, `attachments` takes in a slice and sets both the pointer and count fields. This is applicable for every instance where a pointer to a collection is expected. This API allows for simple usage instead of creating raw pointers in rust.
.blend_constants([0.0, 0.0, 0.0, 0.0])
Nisght is a great tool for debugging frames. You can inspect each pipeline stage, read internal GPU state, view executed GPU commands, profile, inspect buffers, read out uniforms etc… If you have a Turing complete GPU then you can also debug real-time, otherwise, you can only inspect one frame at a time.
For vulkan, I wrote a profiler. Google trace is used to visualize the data. Initially, I tried to use influx DB, but the simplicity of Google trace convinced me to use it. It allows you to see down to the nano-seconds how long code blocks take to execute. My implementation for profiling uses procedural attributes and a normal macro. By means of a feature flag, macros can be turned on or off. The ‘profile’ attribute above a function will monitor the function duration, and a `profile!(“block”)` macro works only for certain scopes. Both macros create an instance of a profile monitor struct, which is disposed at the end of the scope. When it is disposed it notes the elapsed time since the creation of the object and uses that info for google trace.
It is also possible to wrap code in the
profile_fn! macro, a similar process to the attribute happens.
This is the result in google trace:
Or zoomed out with multiple frames:
Now that the tooling is ready, the triangle is drawn its time to start making something more than the default triangle. I thought it would be useful to have some infinite grit in my scene such that I can orient myself better.
An infinite grid can be calculated in several ways. I chose to render it in my shader. I followed this tutorial. The basic steps are as follows:
- Create a plane with 4 vertices that overlap the entire screen, with x,y coordinates of -1, 1.
- In the shader, for any given verticy, invert its view and projection matrix such that we get the world space coordinate, and calculate its position at z=0 and z=1, to get the position at 0 and infinity.
- Set the position in the fragment shader to the original, unprojected, clipped, coordinates.
- Only render the pane when y=0, the entire ground. For this use the near, far points from step 2.
- Calculate the grid lines between near, and far points, from step 2 with some fancy math (see linked tutorial).
- Compute the depth, the grid should never be in front of an object.
- Apply blur, fade, linear depth such that lines in the far distance are less visible. The further lines go the closer they get to each other the uglier the unwished artifacts become. It's better to render up to a certain configurable distance.
An interesting problem I hadn’t thought of at first is to support multiple textures. Sounds simple. But in Vulkan I had the following options:
- Texture Array. Provide an array of
Sampler2Dobjects where each sampler links to an underlying
ImageView. Each image view represents a texture in memory. Each object can define an image id. And when rendering the image id of this object is uploaded to the GPU using push constants.
- Texture Atlas. Basically a sprite sheet. This will not work well if you working with 2/4k textures and many different objects. Could be useful for 2d games.
- Swap Texture Memory with active texture. Basically, you have texture memory hooked up and a descriptor layout pointing to that. Swap the texture memory with the texture that is required at run time. The downsides are that you will constantly require to interrupt the GPU and reallocate memory on it during rendering.
- Generate multiple descriptors set layouts and descriptor sets, layouts for all texture combinations that can be hooked up. Then when creating the pipeline, attach the descriptor layouts, and when rendering, attach only the descriptor sets that point to the texture which is being rendered. I am not even sure if this would work, and it sounds pretty complex.
From those four I found 1 and 2 are the most used ones. I went with the texture array since it is pretty dynamic with a lot of control over what textures are bound per object. The pro in this method is the ability to leverage push constants, which are quite fast. The other methods more or less sound like a weird solution to the problem, since you updating GPU memory or adding lots of redundancy by creating different descriptor sets.
Mouse Object Picking
I wanted to be able to select an object so I could show its components in the UI. There are several ways to do this:
- Usage of compute shader. Sample the color output at the cursor location and write the result into a host visible buffer or do a sub image copy (the part around the cursor) from the picker texture to a smaller host readable linear image or buffer after the renderpass. (found this on reddit)
- Usage of raytracing. This method is only possible if a physics engine has been implemented. If that is the case, it is relatively easy to shoot a ray from the mouse coordinates and calculate all intersections. Then it is just a matter of checking which object is selected first.
- Usage of separate render pass. This is the method I use. Basically, you render the id of the object being rendered into a pixel. You can encode an object id in an RGB value but also the format of a Vulkan image can be changed to an integer. Thus you can directly store the object id in a pixel. After rendering, you copy the contents of a Vulkan image to a Vulkan buffer and map the memory to application memory. Then you can use the x, y mouse coordinate to extract the object id from a given pixel and know which object is selected. To further optimize, it is possible to render only the region being licked, and copy it from the image.
For UI I used Imgui-rs, a Rust bindings library on top of the CPP imgui library. In short, it was a drama to get this to work with Rust and Vulkan. Rust is known for GUI hell and Vulkan is known for API hell. So that is a good combo. ImGUI returns `DrawData`, which contains vertices, indices, render commands, font textures for color sampling, etc… You can use this data to render the UI with your own Vulkan implementation.
If the UI is moved, scaled, collapsed, the vertex/index data changes. If there are changes, the vertex/index buffer must be updated, reallocated, and the old memory removed. Also, the font sample texture can change per UI element, and per render command the ‘scissors’, index and vertex offsets of main buffers, and the number of indications must be updated. In addition, commands have to be recorded in real-time, since commands can vary with changes to the UI. Since I had not seen any of these things coming I had to do a complete rewrite of a lot of internal logic. I miss c++ and openGL where you could render something with `ImGui::Text()` :). And Rust makes with all borrowing rules not much easier. But the result is worth it after a week of plowing:
A mesh has static vertices and indexes, so the number of render commands does not change (if not modifying the mesh). The UI changes and as a result the number of render commands changes at runtime. Because of this, I had to support dynamic runtime command recording. It first seemed like a good idea to support both static and dynamic command recording. However, while implementing this, I was advised by the Vulkan Discord that command recording takes hardly any time at all and you are better of recording it every frame and resetting the command buffer after the rendering finishes. It's unnecessary optimization.
The LearnOpenGL website is a great resource on physical accurate rendering with BRDF function. I am using high-resolution assets from quixel bridge to test my implementation. On this image is a sphere with albedo, roughness, metallic, normal, ao materials. An object indicates via push constants which textures it uses from the texture array. Then those are used for calculating the refraction, reflection, irradiance, and radiance. Currently, I only have one light source that can be moved at runtime with the UI.
Asset Loading Time Problem
If an object uses 5 * 2K material textures, and this takes 10 seconds to load from disk in a linear fashion, without any narrow optimization, how much more would this be with 8K textures or scenes with many different objects. To solve this problem, I looked at different libraries.
- Distill. Too many configurations, work is required to get it working. Examples looked a bit too confusing. However, the project seems to be actively developed, and look forward to their future works.
- Benvy Assets. Dependency to ‘benvy engine’, on which I don't want to depend.
- Asset Manager. Hard to get it working in an async context.
- Asset. Does support async, but isn't maintained for 3 years.
After trying all those libraries I went for my own implementation. This implementation is using tokio async file reader, await the file read task in the tokio runtime, and when the bytes are read use the ‘image’ library to interpret these bytes. Based on that, we have to generate the corresponding image buffers, texture instances, texture samples for the texture array. With this method, the loading time changed drastically, from 26 seconds for 5 textures to 4 seconds. Still, this is a lot, maybe in the next steps, I can look at which textures are needed most, apply mid mapping, maybe introducing some debug modes where only the objects I am working on are rendered in detail. Etc… I really wonder how engines such as unreal tackle this issue. Anyhow, smoll might serve this use case better than tokio since it's more lightweight than the full-fledged tokio library. A similar workflow can be performed for the object files that contain thousands of vertices and other data points.
Last but not least. Initially, I used OBJ files, however, with many scene objects, it is difficult to maintain them in the game world. As an alternative, two other known formats exist. FBX and GlTF. FBX is very widely used in the graphics industry and has been around ~10 years longer than glTF its first version. Both formats can store animations, materials, transformations, buffers, meshes, and more. GLTF’s advantages are that it is created by a non-profit organization Khronos group, it is not subject to licensing, it is simply a specification, and it is/can-be human-readable (json). On the other hand, the FBX format is owned by Autodesk, needs to be used through the FBX SDK, and is subject to licensing. For these reasons, I decided to create a Vulkan GLTF viewer in rust using the
gltf library. An example of this can be found in the source code of the engine.
Vulkan is a great, specific, direct, and clear API that operates without magic from underlying state machines. It takes some time to learn it, but this time is well spent, it forces someone to go through each step of the graphics pipeline and make choices along the way to customize the pipeline to the user its needs. The engine I am creating is still in progress and will be adding some more here in the near future.
Github Souce: https://github.com/TimonPost/anasaizi