Gromacs  2025.0-dev-20241014-f673b97
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
Classes | Functions | Variables
gpu_3dfft_sycl_rocfft.cpp File Reference
#include "gmxpre.h"
#include "gpu_3dfft_sycl_rocfft.h"
#include <vector>
#include "gromacs/gpu_utils/device_stream.h"
#include "gromacs/gpu_utils/devicebuffer.h"
#include "gromacs/utility/enumerationhelpers.h"
#include "gromacs/utility/exceptions.h"
#include "gromacs/utility/gmxassert.h"
#include "rocfft_common_utils.h"
+ Include dependency graph for gpu_3dfft_sycl_rocfft.cpp:

Description

Implements GPU 3D FFT routines for hipSYCL via rocFFT.

Author
Andrey Alekseenko al42a.nosp@m.nd@g.nosp@m.mail..nosp@m.com
Mark Abraham mark..nosp@m.j.ab.nosp@m.raham.nosp@m.@gma.nosp@m.il.co.nosp@m.m

For hipSYCL, in order to call FFT APIs from the respective vendors using the same DeviceStream as other operations, a vendor extension called "custom operations" is used (see hipSYCL doc/enqueue-custom-operation.md). That effectively enqueues an asynchronous host-side lambda into the same queue. The body of the lambda unpacks the runtime data structures to get the native handles and calls the native FFT APIs.

hipSYCL queues operate at a higher level of abstraction than hip streams, with the runtime distributing work to the latter to balance load. It is possible to set the HIP stream in rocfft_execution_info, but then there is no guarantee that a subsequent queue item will run using the same stream. So we currently do not attempt to set the stream.

Classes

class  gmx::Gpu3dFft::ImplSyclRocfft::Impl
 Impl class. More...
 

Functions

RocfftPlan gmx::anonymous_namespace{gpu_3dfft_sycl_rocfft.cpp}::makePlan (const std::string &descriptiveString, rocfft_transform_type transformType, const PlanSetupData &inputPlanSetupData, const PlanSetupData &outputPlanSetupData, ArrayRef< const size_t > rocfftRealGridSize, const DeviceStream &pmeStream)
 Prepare plans for the forward and reverse transformation. More...
 
 gmx::makePlan ("complex-to-real", rocfft_transform_type_real_inverse, PlanSetupData{rocfft_array_type_hermitian_interleaved, makeComplexStrides(complexGridSizePadded), computeTotalSize(complexGridSizePadded)}, PlanSetupData{rocfft_array_type_real, makeRealStrides(realGridSizePadded), computeTotalSize(realGridSizePadded)}, std::vector< size_t >{size_t(realGridSize[ZZ]), size_t(realGridSize[YY]), size_t(realGridSize[XX])}, pmeStream)
 
realGrid_ * gmx::realGrid ()),{GMX_RELEASE_ASSERT(performOutOfPlaceFFT,"Only out-of-place FFT is implemented in hipSYCL"
 
 gmx::GMX_RELEASE_ASSERT (allocateRealGrid==false,"Grids need to be pre-allocated")
 
 gmx::GMX_RELEASE_ASSERT (gridSizesInXForEachRank.size()==1 &&gridSizesInYForEachRank.size()==1,"FFT decomposition not implemented with the SYCL rocFFT backend")
 

Variables

 gmx::pmeStream