Module cooperative_matrix

Module cooperative_matrix 

Source
Expand description

Cooperative Matrix Multiplication Example

This example demonstrates how to use cooperative matrix operations (also known as tensor cores on NVIDIA GPUs or simdgroup matrix operations on Apple GPUs) to perform efficient matrix multiplication.

Cooperative matrices allow a workgroup to collectively load, store, and perform matrix operations on small tiles of data, enabling hardware-accelerated matrix math.

Note: This feature requires hardware support and is currently experimental. Use adapter.cooperative_matrix_properties() to query supported configurations:

  • Metal (Apple): 8x8 f32, 8x8 f16, mixed precision (f16 inputs, f32 accumulator)
  • Vulkan (AMD): Typically 16x16 f16
  • Vulkan (NVIDIA): Varies by GPU generation

Structsยง

Dimensions ๐Ÿ”’
ExecuteResults ๐Ÿ”’

Constantsยง

K ๐Ÿ”’
M ๐Ÿ”’
Matrix dimensions for our example (must be divisible by tile size)
N ๐Ÿ”’

Functionsยง

execute ๐Ÿ”’
main
run ๐Ÿ”’