_tile_mmultf32ps

Function _tile_mmultf32ps 

Source
pub unsafe fn _tile_mmultf32ps(const DST: i32, const A: i32, const B: i32)
🔬This is a nightly-only experimental API. (x86_amx_intrinsics #126622)
Available on x86-64 and target feature amx-tf32 only.
Expand description

Perform matrix multiplication of two tiles a and b, containing packed single precision (32-bit) floating-point elements, which are converted to TF32 (tensor-float32) format, and accumulate the results into a packed single precision tile. For each possible combination of (row of a, column of b), it performs

  • convert to TF32
  • multiply the corresponding elements of a and b
  • accumulate the results into the corresponding row and column of dst using round-to-nearest-even rounding mode. Output FP32 denormals are always flushed to zero, input single precision denormals are always handled and not treated as zero.