Module cooperative_matrix

Expand description

Cooperative Matrix Multiplication Example

This example demonstrates how to use cooperative matrix operations (also known as tensor cores on NVIDIA GPUs or simdgroup matrix operations on Apple GPUs) to perform efficient matrix multiplication.

Cooperative matrices allow a workgroup to collectively load, store, and perform matrix operations on small tiles of data, enabling hardware-accelerated matrix math.

Note: This feature requires hardware support and is currently experimental. Use adapter.cooperative_matrix_properties() to query supported configurations:

Metal (Apple): 8x8 f32, 8x8 f16, mixed precision (f16 inputs, f32 accumulator)
Vulkan (AMD): Typically 16x16 f16
Vulkan (NVIDIA): Varies by GPU generation

Structs§

Dimensions 🔒
ExecuteResults 🔒

Constants§

K 🔒
M 🔒: Matrix dimensions for our example (must be divisible by tile size)
N 🔒

Functions§

execute 🔒
main
run 🔒