core::arch::x86_64

Function _tile_mmultf32ps

pub unsafe fn _tile_mmultf32ps(const DST: i32, const A: i32, const B: i32)

🔬This is a nightly-only experimental API. (x86_amx_intrinsics #126622)

Available on x86-64 and target feature amx-tf32 only.

Expand description

Perform matrix multiplication of two tiles a and b, containing packed single precision (32-bit) floating-point elements, which are converted to TF32 (tensor-float32) format, and accumulate the results into a packed single precision tile. For each possible combination of (row of a, column of b), it performs

convert to TF32
multiply the corresponding elements of a and b
accumulate the results into the corresponding row and column of dst using round-to-nearest-even rounding mode. Output FP32 denormals are always flushed to zero, input single precision denormals are always handled and not treated as zero.

_tile_mmultf32ps

Function _tile_mmultf32ps Copy item path

Function _tile_mmultf32ps