Skip to main content

__tile_mmultf32ps

Function __tile_mmultf32ps 

Source
pub unsafe fn __tile_mmultf32ps(
    dst: *mut __tile1024i,
    a: __tile1024i,
    b: __tile1024i,
)
🔬This is a nightly-only experimental API. (x86_amx_intrinsics #126622)
Available on x86-64 and target feature amx-tf32 only.
Expand description

Perform matrix multiplication of two tiles a and b, containing packed single precision (32-bit) floating-point elements, which are converted to TF32 (tensor-float32) format, and accumulate the results into a packed single precision tile. For each possible combination of (row of a, column of b), it performs

  • convert to TF32
  • multiply the corresponding elements of a and b
  • accumulate the results into the corresponding row and column of dst using round-to-nearest-even rounding mode. Output FP32 denormals are always flushed to zero, input single precision denormals are always handled and not treated as zero. The shape of the tile is specified in the struct of __tile1024i. The register of the tile is allocated by the compiler.