Gromacs  2024.4
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
List of all members | Public Types | Public Member Functions
AmdPackedFloat3 Struct Reference

#include <gromacs/gpu_utils/sycl_kernel_utils.h>

Description

Special packed Float3 flavor to help compiler optimizations on AMD CDNA2 devices.

Full FP32 performance of AMD CDNA2 devices, like MI200-series, can only be achieved when operating on float2, in a SIMD2-fashion. Compiler (at least up to ROCm 5.6) can use packed math automatically for normal Float3, but generates a lot of data movement between normal and packed registers. Using this class helps avoid this problem.

The approach is based on the similar solution used by AMD and StreamHPC in their port.

Currently only used in NBNXM kernels if GMX_NBNXM_ENABLE_PACKED_FLOAT3 is enabled

Todo:
This class shall be removed as soon as the compiler is improved.

See issue #4854 for more details.

Public Types

typedef float __attribute__ ((ext_vector_type(2))) Native_float2_
 

Public Member Functions

struct __attribute__ ((packed))
 
template<typename Index >
 __attribute__ ((always_inline)) float operator[](Index i) const
 
template<typename Index >
 __attribute__ ((always_inline)) float &operator[](Index i)
 
 __attribute__ ((always_inline)) float x() const
 
 __attribute__ ((always_inline)) float y() const
 
 __attribute__ ((always_inline)) Native_float2_ xy() const
 
 __attribute__ ((always_inline)) float z() const
 
 AmdPackedFloat3 (float x, float y, float z)
 
 AmdPackedFloat3 (Native_float2_ xy, float z)
 
 AmdPackedFloat3 (Float3 r)
 
 operator Float3 () const
 

The documentation for this struct was generated from the following file: