Gromacs  2026.0-dev-20241121-c76fa1e
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
Functions | Variables
#include "gmxpre.h"
#include "pme.h"
#include "config.h"
#include <cassert>
#include <cmath>
#include <cstdio>
#include <cstdlib>
#include <cstring>
#include <algorithm>
#include <array>
#include <filesystem>
#include <list>
#include <tuple>
#include <utility>
#include "gromacs/domdec/domdec.h"
#include "gromacs/ewald/ewald_utils.h"
#include "gromacs/ewald/pme_output.h"
#include "gromacs/fft/fft.h"
#include "gromacs/fft/parallel_3dfft.h"
#include "gromacs/fileio/pdbio.h"
#include "gromacs/gmxlib/network.h"
#include "gromacs/gmxlib/nrnb.h"
#include "gromacs/gpu_utils/hostallocator.h"
#include "gromacs/hardware/hw_info.h"
#include "gromacs/math/boxmatrix.h"
#include "gromacs/math/functions.h"
#include "gromacs/math/gmxcomplex.h"
#include "gromacs/math/units.h"
#include "gromacs/math/vec.h"
#include "gromacs/math/vectypes.h"
#include "gromacs/mdtypes/commrec.h"
#include "gromacs/mdtypes/forcerec.h"
#include "gromacs/mdtypes/inputrec.h"
#include "gromacs/mdtypes/md_enums.h"
#include "gromacs/mdtypes/simulation_workload.h"
#include "gromacs/pbcutil/pbc.h"
#include "gromacs/timing/cyclecounter.h"
#include "gromacs/timing/wallcycle.h"
#include "gromacs/timing/walltime_accounting.h"
#include "gromacs/topology/topology.h"
#include "gromacs/utility/alignedallocator.h"
#include "gromacs/utility/arrayref.h"
#include "gromacs/utility/basedefinitions.h"
#include "gromacs/utility/cstringutil.h"
#include "gromacs/utility/exceptions.h"
#include "gromacs/utility/fatalerror.h"
#include "gromacs/utility/gmxassert.h"
#include "gromacs/utility/gmxmpi.h"
#include "gromacs/utility/gmxomp.h"
#include "gromacs/utility/iserializer.h"
#include "gromacs/utility/logger.h"
#include "gromacs/utility/message_string_collector.h"
#include "gromacs/utility/real.h"
#include "gromacs/utility/stringutil.h"
#include "gromacs/utility/unique_cptr.h"
#include "calculate_spline_moduli.h"
#include "pme_gather.h"
#include "pme_gpu_internal.h"
#include "pme_grid.h"
#include "pme_internal.h"
#include "pme_redistribute.h"
#include "pme_solve.h"
#include "pme_spline_work.h"
#include "pme_spread.h"
+ Include dependency graph for pme.cpp:

Description

This file contains function definitions necessary for computing energies and forces for the PME long-ranged part (Coulomb and LJ).

Author
Erik Lindahl erik@.nosp@m.kth..nosp@m.se
Berk Hess hess@.nosp@m.kth..nosp@m.se

Functions

bool pme_gpu_supports_build (std::string *error)
 Checks whether the GROMACS build allows to run PME on GPU. TODO: this partly duplicates an internal PME assert function pme_gpu_check_restrictions(), except that works with a formed gmx_pme_t structure. Should that one go away/work with inputrec? More...
 
bool pme_gpu_supports_input (const t_inputrec &ir, std::string *error)
 Checks whether the input system allows to run PME on GPU. TODO: this partly duplicates an internal PME assert function pme_gpu_check_restrictions(), except that works with a formed gmx_pme_t structure. Should that one go away/work with inputrec? More...
 
bool pme_gpu_mixed_mode_supports_input (const t_inputrec &ir, std::string *error)
 Checks whether the input system allows to run PME on GPU in Mixed mode. Assumes that the input system is compatible with GPU PME otherwise, that is, before calling this function one should check that pme_gpu_supports_input returns true. More...
 
static bool pme_gpu_check_restrictions (const gmx_pme_t *pme, std::string *error)
 Finds out if PME with given inputs is possible to run on GPU. This function is an internal final check, validating the whole PME structure on creation, but it still duplicates the preliminary checks from the above (externally exposed) pme_gpu_supports_input() - just in case. More...
 
PmeRunMode pme_run_mode (const gmx_pme_t *pme)
 Returns the active PME codepath (CPU, GPU, mixed). More...
 
gmx::PinningPolicy pme_get_pinning_policy ()
 Return the pinning policy appropriate for this build configuration for relevant buffers used for PME task on this rank (e.g. running on a GPU).
 
static void setup_coordinate_communication (PmeAtomComm *atc)
 Set up coordinate communication.
 
static int mult_up (int n, int f)
 Round n up to the next multiple of f.
 
static double estimate_pme_load_imbalance (struct gmx_pme_t *pme)
 Return estimate of the load imbalance from the PME grid not being a good match for the number of PME ranks.
 
static void init_overlap_comm (pme_overlap_t *ol, int norder, MPI_Comm comm, int nnodes, int nodeid, int ndata, int commplainsize)
 Initialize data structure for communication.
 
int minimalPmeGridSize (int pmeOrder)
 Return the smallest allowed PME grid size for pmeOrder.
 
int numGridLinesForExtendedHaloRegion (int pmeOrder, real haloExtentForAtomDisplacement, real gridSpacing)
 Calculates the number of grid lines used as halo region for PME decomposition. More...
 
real getGridSpacingFromBox (const matrix box, const ivec gridDim)
 Gets grid spacing from simulation box and grid dimension. More...
 
bool gmx_pme_check_restrictions (int pme_order, int nkx, int nky, int nkz, int numPmeDomainsAlongX, int numPmeDomainsAlongY, int extendedHaloRegion, bool useGpuPme, bool useThreads, bool errorsAreFatal)
 Check restrictions on pme_order and the PME grid nkx,nky,nkz. More...
 
static void initGrids (gmx::ArrayRef< PmeAndFftGrids > gridsSet, const gmx_pme_t &pme, const bool requestReproducibility, gmx::ArrayRef< std::vector< AlignedVector< real >>> gridsStorage)
 
gmx_pme_t * gmx_pme_init (const t_commrec *cr, const NumPmeDomains &numPmeDomains, const t_inputrec *ir, const matrix box, real haloExtentForAtomDisplacement, gmx_bool bFreeEnergy_q, gmx_bool bFreeEnergy_lj, gmx_bool bReproducible, real ewaldcoeff_q, real ewaldcoeff_lj, int nthread, PmeRunMode runMode, PmeGpu *pmeGpu, const DeviceContext *deviceContext, const DeviceStream *deviceStream, const PmeGpuProgram *pmeGpuProgram, const gmx::MDLogger &mdlog, std::shared_ptr< PmeGridsStorage > pmeGridsStoragePtr)
 Construct PME data. More...
 
void gmx_pme_reinit (struct gmx_pme_t **pmedata, const t_commrec *cr, struct gmx_pme_t *pme_src, const t_inputrec *ir, const ivec grid_size, real ewaldcoeff_q, real ewaldcoeff_lj)
 As gmx_pme_init, but takes most settings, except the grid/Ewald coefficients, and the shared grid storage from pme_src.
 
real gmx_pme_calc_energy (gmx_pme_t *pme, gmx::ArrayRef< const gmx::RVec > x, gmx::ArrayRef< const real > q)
 Calculate the PME grid energy V for n charges. More...
 
static void calc_initial_lb_coeffs (gmx::ArrayRef< real > coefficient, gmx::ArrayRef< const real > local_c6, gmx::ArrayRef< const real > local_sigma)
 Calculate initial Lorentz-Berthelot coefficients for LJ-PME.
 
static void calc_next_lb_coeffs (gmx::ArrayRef< real > coefficient, gmx::ArrayRef< const real > local_sigma)
 Calculate next Lorentz-Berthelot coefficients for LJ-PME.
 
int gmx_pme_do (struct gmx_pme_t *pme, gmx::ArrayRef< const gmx::RVec > coordinates, gmx::ArrayRef< gmx::RVec > forces, gmx::ArrayRef< const real > chargeA, gmx::ArrayRef< const real > chargeB, gmx::ArrayRef< const real > c6A, gmx::ArrayRef< const real > c6B, gmx::ArrayRef< const real > sigmaA, gmx::ArrayRef< const real > sigmaB, const matrix box, const t_commrec *cr, int maxshift_x, int maxshift_y, t_nrnb *nrnb, gmx_wallcycle *wcycle, matrix vir_q, matrix vir_lj, real *energy_q, real *energy_lj, real lambda_q, real lambda_lj, real *dvdlambda_q, real *dvdlambda_lj, const gmx::StepWorkload &stepWork)
 Do a PME calculation on a CPU for the long range electrostatics and/or LJ. More...
 
void parallel_3dfft_destroy (gmx_parallel_3dfft *pfft_setup)
 
void gmx_pme_destroy (gmx_pme_t *pme)
 Destroys the PME data structure (including GPU data).
 
void gmx_pme_destroy (gmx_pme_t *pme, bool destroyGpuData)
 Destroys the PME data structure. More...
 
void gmx_pme_reinit_atoms (gmx_pme_t *pme, const int numAtoms, gmx::ArrayRef< const real > chargesA, gmx::ArrayRef< const real > chargesB)
 This function updates the local atom data on GPU after DD (charges, coordinates, etc.). TODO: it should update the PME CPU atom data as well. (currently PME CPU call gmx_pme_do() gets passed the input pointers for each computation). More...
 
bool gmx_pme_grid_matches (const gmx_pme_t &pme, const ivec grid_size)
 Return whether the grid of pme is identical to grid_size.
 

Variables

const int gmxCacheLineSize = 64
 Number of bytes in a cache line. More...
 

Function Documentation

real getGridSpacingFromBox ( const matrix  box,
const ivec  gridDim 
)

Gets grid spacing from simulation box and grid dimension.

Parameters
[in]boxSimulation box
[in]gridDimFourier grid dimension
Returns
Grid spacing value
real gmx_pme_calc_energy ( gmx_pme_t *  pme,
gmx::ArrayRef< const gmx::RVec x,
gmx::ArrayRef< const real q 
)

Calculate the PME grid energy V for n charges.

The potential (found in pme) must have been found already with a call to gmx_pme_do(). Note that the charges are not spread on the grid in the pme struct. Currently does not work in parallel or with free energy.

bool gmx_pme_check_restrictions ( int  pme_order,
int  nkx,
int  nky,
int  nkz,
int  numPmeDomainsAlongX,
int  numPmeDomainsAlongY,
int  extendedHaloRegion,
bool  useGpuPme,
bool  useThreads,
bool  errorsAreFatal 
)

Check restrictions on pme_order and the PME grid nkx,nky,nkz.

With errorsAreFatal=true, an exception or fatal error is generated on violation of restrictions. With errorsAreFatal=false, false is returned on violation of restrictions. When all restrictions are obeyed, true is returned. Argument useThreads tells if any MPI rank doing PME uses more than 1 threads. If at calling useThreads is unknown, pass true for conservative checking.

The PME GPU restrictions are checked separately during pme_gpu_init().

void gmx_pme_destroy ( gmx_pme_t *  pme,
bool  destroyGpuData 
)

Destroys the PME data structure.

Parameters
pmeThe data structure to destroy.
destroyGpuDataSet to false if pme is a copy created by gmx_pme_reinit, and only it should be destroyed, while shared GPU data should be preserved. If true, the shared data will be destroyed, like in gmx_pme_destroy(gmx_pme_t* pme).
int gmx_pme_do ( struct gmx_pme_t *  pme,
gmx::ArrayRef< const gmx::RVec coordinates,
gmx::ArrayRef< gmx::RVec forces,
gmx::ArrayRef< const real chargeA,
gmx::ArrayRef< const real chargeB,
gmx::ArrayRef< const real c6A,
gmx::ArrayRef< const real c6B,
gmx::ArrayRef< const real sigmaA,
gmx::ArrayRef< const real sigmaB,
const matrix  box,
const t_commrec *  cr,
int  maxshift_x,
int  maxshift_y,
t_nrnb *  nrnb,
gmx_wallcycle *  wcycle,
matrix  vir_q,
matrix  vir_lj,
real energy_q,
real energy_lj,
real  lambda_q,
real  lambda_lj,
real dvdlambda_q,
real dvdlambda_lj,
const gmx::StepWorkload stepWork 
)

Do a PME calculation on a CPU for the long range electrostatics and/or LJ.

Computes the PME forces and the energy and viral, when requested, for all atoms in coordinates. Forces, when requested, are added to the buffer forces, which is allowed to contain more elements than the number of elements in coordinates. The meaning of flags is defined above, and determines which parts of the calculation are performed.

Returns
0 indicates all well, non zero is an error code.
gmx_pme_t* gmx_pme_init ( const t_commrec *  cr,
const NumPmeDomains numPmeDomains,
const t_inputrec *  ir,
const matrix  box,
real  haloExtentForAtomDisplacement,
gmx_bool  bFreeEnergy_q,
gmx_bool  bFreeEnergy_lj,
gmx_bool  bReproducible,
real  ewaldcoeff_q,
real  ewaldcoeff_lj,
int  nthread,
PmeRunMode  runMode,
PmeGpu pmeGpu,
const DeviceContext *  deviceContext,
const DeviceStream deviceStream,
const PmeGpuProgram pmeGpuProgram,
const gmx::MDLogger mdlog,
std::shared_ptr< PmeGridsStorage >  pmeGridsStoragePtr 
)

Construct PME data.

Exceptions
gmx::InconsistentInputErrorif input grid sizes/PME order are inconsistent.
Returns
Pointer to newly allocated and initialized PME data.

pmeGridsStorage can be nullptr, in which case new PmeGridsStorage is allocated. When pmeGridsStorage is not a nullptr, grid storage is taken from there.

Todo:
We should evolve something like a GpuManager that holds DeviceInformation* and PmeGpuProgram* and perhaps other related things whose lifetime can/should exceed that of a task (or perhaps task manager). See Issue #2522.
void gmx_pme_reinit_atoms ( gmx_pme_t *  pme,
int  numAtoms,
gmx::ArrayRef< const real chargesA,
gmx::ArrayRef< const real chargesB 
)

This function updates the local atom data on GPU after DD (charges, coordinates, etc.). TODO: it should update the PME CPU atom data as well. (currently PME CPU call gmx_pme_do() gets passed the input pointers for each computation).

Parameters
[in,out]pmeThe PME structure.
[in]numAtomsThe number of particles.
[in]chargesAThe pointer to the array of particle charges in the normal state or FEP state A. Can be nullptr if PME is not performed on the GPU.
[in]chargesBThe pointer to the array of particle charges in state B. Only used if charges are perturbed and can otherwise be nullptr.
int numGridLinesForExtendedHaloRegion ( int  pmeOrder,
real  haloExtentForAtomDisplacement,
real  gridSpacing 
)

Calculates the number of grid lines used as halo region for PME decomposition.

This function is only used for PME GPU decomposition case

Parameters
[in]pmeOrderPME interpolation order
[in]haloExtentForAtomDisplacementextent of halo region in nm to account for atom displacement
[in]gridSpacingspacing between grid lines
Returns
extended halo region used in GPU PME decomposition
static bool pme_gpu_check_restrictions ( const gmx_pme_t *  pme,
std::string *  error 
)
static

Finds out if PME with given inputs is possible to run on GPU. This function is an internal final check, validating the whole PME structure on creation, but it still duplicates the preliminary checks from the above (externally exposed) pme_gpu_supports_input() - just in case.

Parameters
[in]pmeThe PME structure.
[out]errorThe error message if the input is not supported on GPU.
Returns
True if this PME input is possible to run on GPU, false otherwise.
bool pme_gpu_mixed_mode_supports_input ( const t_inputrec &  ir,
std::string *  error 
)

Checks whether the input system allows to run PME on GPU in Mixed mode. Assumes that the input system is compatible with GPU PME otherwise, that is, before calling this function one should check that pme_gpu_supports_input returns true.

Parameters
[in]irInput system.
[out]errorIf non-null, the error message if the input is not supported.
Returns
true if PME can run on GPU in Mixed mode with this input, false otherwise.
bool pme_gpu_supports_build ( std::string *  error)

Checks whether the GROMACS build allows to run PME on GPU. TODO: this partly duplicates an internal PME assert function pme_gpu_check_restrictions(), except that works with a formed gmx_pme_t structure. Should that one go away/work with inputrec?

Parameters
[out]errorIf non-null, the error message when PME is not supported on GPU.
Returns
true if PME can run on GPU on this build, false otherwise.
bool pme_gpu_supports_input ( const t_inputrec &  ir,
std::string *  error 
)

Checks whether the input system allows to run PME on GPU. TODO: this partly duplicates an internal PME assert function pme_gpu_check_restrictions(), except that works with a formed gmx_pme_t structure. Should that one go away/work with inputrec?

Parameters
[in]irInput system.
[out]errorIf non-null, the error message if the input is not supported on GPU.
Returns
true if PME can run on GPU with this input, false otherwise.
PmeRunMode pme_run_mode ( const gmx_pme_t *  pme)

Returns the active PME codepath (CPU, GPU, mixed).

Todo:
This is a rather static data that should be managed by the higher level task scheduler.
Parameters
[in]pmeThe PME data structure.
Returns
active PME codepath.

Variable Documentation

const int gmxCacheLineSize = 64

Number of bytes in a cache line.

Must also be a multiple of the SIMD and SIMD4 register size, to preserve alignment.