VkComputeUtils.jl
This package simplifies the work with some most basic and repetitive boilerplate-ish coding that is required to run "pure" Vulkan compute apps.
We attempt not to impose any kind of framework for the computation, instead providing a library of small helpers that automate the boring parts. The use is best documented by the examples in the test suite, for example the simple shader test that reproduces the "minimal working compute" tutorial from Vulkan.jl.
Here, we highlight the main parts.
Choosing suitable facilities
Finding of vital choices of hardware and queues is implemented using functions that take a predicate and a scoring function and return the best-scoring match that satisfies the predicate. This way, you can e.g. select a "physical GPU with most memory available", etc.
Finding a good physical device
To find a good physical device, use find_physical_device
. This example will return the first physical device that contains the name of your favorite vendor in the device name:
physical_device = find_physical_device(
instance,
check = props -> contains(props.device_name, "MyFavoriteVendor"),
)
Finding a queue family index
Similarly, find_queue_family_idx
is used to find suitable queue families. This example finds a compute queue family with greatest amount of queues:
qfam_idx = find_queue_family_idx(
physical_device,
check = qfp -> hasbits(qfp.queue_flags, QUEUE_COMPUTE_BIT),
score = qfp -> qfp.queue_count,
)
Finding a memory type
Once more, the same scheme is used for finding suitable memory types, except the predicate and scoring functions may also examine the memory heap properties. This finds a device-local memory type with the largest heap:
memorytype_local_visible = find_memory_type_idx(
physical_device,
check = (mt, heap) -> hasbits(
mt.property_flags,
MEMORY_PROPERTY_DEVICE_LOCAL_BIT,
),
score = (mt, heap) -> heap.size,
)
Compiling compute shaders into pipelines
The framework around the compute shader (pipeline layout, descriptor set layout, pipeline, descriptor pool, descriptor set) is grouped within a ComputeShaderPipeline
structure. To create one, use the constructor as follows:
example_shader = """
layout (local_size_x = 1024) in;
layout (push_constant) uniform Params
{
uint n;
} params;
layout(std430, binding=0) buffer databuf
{
float data[];
};
void main()
{
uint i = gl_GlobalInvocationID.x;
if(i < params.n) data[i] = 123456;
}
"""
shader = ComputeShaderPipeline(
device, # created by create_device
example_shader, # the shader code, will be compiled with glslangValidator
1, # there's 1 buffer binding
(), # there are no specialization constants
UInt32, # this is the type of the push constants in the shader
)
Using the compiled shader
You can now easily bind your buffers with allocated memory to the shader pipeline:
write_descriptor_set_buffers(device, shader, [mybuffer])
Later, you may easily add the command buffer entries for binding the pipeline, pushing the constants, binding the descriptor sets, and dispatching the pipeline in one call:
consts = hold_push_constants(shader, my_n_items) # materializes the push constants (this should not be GC'd)
cmd_bind_dispatch(
command_buffer, # created with allocate_command_buffers and "started" with begin_command_buffer
shader, # created with ComputeShaderPipeline, as above
consts, # push constants data
n_blocks(my_n_items, 1024), # number of workgroups ("threadblocks") in x direction
1, # same for y
1, # z
)
Viewing the host-accessible memory
The mapped Vulkan memory can be viewed through Julia vectors, using some minor unsafe assumptions. map_memory_as_vector
manages the conversion using properly "typed" offsets and sizes:
my_vector = map_memory_as_vector(device, someMemory, 0, 100, Float32)
length(my_vector) == 100
eltype(my_vector) == Float32
my_vector .= 0 #zero out the Vulkan memory