core::arch::x86_64

Function __tile_mmultf32ps

pub unsafe fn __tile_mmultf32ps(
    dst: *mut __tile1024i,
    a: __tile1024i,
    b: __tile1024i,
)

🔬This is a nightly-only experimental API. (x86_amx_intrinsics #126622)

Available on x86-64 and target feature amx-tf32 only.

Expand description

Perform matrix multiplication of two tiles a and b, containing packed single precision (32-bit) floating-point elements, which are converted to TF32 (tensor-float32) format, and accumulate the results into a packed single precision tile. For each possible combination of (row of a, column of b), it performs

convert to TF32
multiply the corresponding elements of a and b
accumulate the results into the corresponding row and column of dst using round-to-nearest-even rounding mode. Output FP32 denormals are always flushed to zero, input single precision denormals are always handled and not treated as zero. The shape of the tile is specified in the struct of __tile1024i. The register of the tile is allocated by the compiler.

__tile_mmultf32ps

Function __tile_mmultf32ps Copy item path

Function __tile_mmultf32ps