pub unsafe fn __tile_mmultf32ps(
dst: *mut __tile1024i,
a: __tile1024i,
b: __tile1024i,
)🔬This is a nightly-only experimental API. (
x86_amx_intrinsics #126622)Available on x86-64 and target feature
amx-tf32 only.Expand description
Perform matrix multiplication of two tiles a and b, containing packed single precision (32-bit) floating-point elements, which are converted to TF32 (tensor-float32) format, and accumulate the results into a packed single precision tile. For each possible combination of (row of a, column of b), it performs
- convert to TF32
- multiply the corresponding elements of a and b
- accumulate the results into the corresponding row and column of dst using round-to-nearest-even
rounding mode.
Output FP32 denormals are always flushed to zero, input single precision denormals are always
handled and not treated as zero.
The shape of the tile is specified in the struct of
__tile1024i. The register of the tile is allocated by the compiler.