#include <gromacs/gpu_utils/sycl_kernel_utils.h>

Description

Special packed Float3 flavor to help compiler optimizations on AMD CDNA2 devices.

Full FP32 performance of AMD CDNA2 devices, like MI200-series, can only be achieved when operating on float2, in a SIMD2-fashion. Compiler (at least up to ROCm 5.6) can use packed math automatically for normal Float3, but generates a lot of data movement between normal and packed registers. Using this class helps avoid this problem.

The approach is based on the similar solution used by AMD and StreamHPC in their port.

Currently only used in NBNXM kernels if GMX_NBNXM_ENABLE_PACKED_FLOAT3 is enabled

Todo:: This class shall be removed as soon as the compiler is improved.

See issue #4854 for more details.

Public Types
typedef float	__attribute__ ((ext_vector_type(2))) Native_float2_

Public Member Functions
struct	__attribute__ ((packed))

template<typename Index >
	__attribute__ ((always_inline)) float operator[](Index i) const

template<typename Index >
	__attribute__ ((always_inline)) float &operator[](Index i)

	__attribute__ ((always_inline)) float x() const

	__attribute__ ((always_inline)) float y() const

	__attribute__ ((always_inline)) Native_float2_ xy() const

	__attribute__ ((always_inline)) float z() const

	AmdPackedFloat3 (float x, float y, float z)

	AmdPackedFloat3 (Native_float2_ xy, float z)

	AmdPackedFloat3 (Float3 r)

	operator Float3 () const

The documentation for this struct was generated from the following file:

src/gromacs/gpu_utils/sycl_kernel_utils.h

Description

Public Types

Public Member Functions