pub unsafe fn _tile_mmultf32ps(const DST: i32, const A: i32, const B: i32)🔬This is a nightly-only experimental API. (
x86_amx_intrinsics #126622)Available on x86-64 and target feature
amx-tf32 only.Expand description
Perform matrix multiplication of two tiles a and b, containing packed single precision (32-bit) floating-point elements, which are converted to TF32 (tensor-float32) format, and accumulate the results into a packed single precision tile. For each possible combination of (row of a, column of b), it performs
- convert to TF32
- multiply the corresponding elements of a and b
- accumulate the results into the corresponding row and column of dst using round-to-nearest-even rounding mode. Output FP32 denormals are always flushed to zero, input single precision denormals are always handled and not treated as zero.