Gromacs
2024.4
|
#include <gromacs/gpu_utils/sycl_kernel_utils.h>
Special packed Float3 flavor to help compiler optimizations on AMD CDNA2 devices.
Full FP32 performance of AMD CDNA2 devices, like MI200-series, can only be achieved when operating on float2, in a SIMD2-fashion. Compiler (at least up to ROCm 5.6) can use packed math automatically for normal Float3, but generates a lot of data movement between normal and packed registers. Using this class helps avoid this problem.
The approach is based on the similar solution used by AMD and StreamHPC in their port.
Currently only used in NBNXM kernels if GMX_NBNXM_ENABLE_PACKED_FLOAT3 is enabled
See issue #4854 for more details.
Public Types | |
typedef float | __attribute__ ((ext_vector_type(2))) Native_float2_ |
Public Member Functions | |
struct | __attribute__ ((packed)) |
template<typename Index > | |
__attribute__ ((always_inline)) float operator[](Index i) const | |
template<typename Index > | |
__attribute__ ((always_inline)) float &operator[](Index i) | |
__attribute__ ((always_inline)) float x() const | |
__attribute__ ((always_inline)) float y() const | |
__attribute__ ((always_inline)) Native_float2_ xy() const | |
__attribute__ ((always_inline)) float z() const | |
AmdPackedFloat3 (float x, float y, float z) | |
AmdPackedFloat3 (Native_float2_ xy, float z) | |
AmdPackedFloat3 (Float3 r) | |
operator Float3 () const | |