Gromacs  2026.0-dev-20241204-d69d709
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
Functions
gpu_utils_hip.cpp File Reference
#include "gmxpre.h"
#include <hip/hip_profile.h>
#include "gromacs/gpu_utils/hiputils.h"
#include "gromacs/hardware/device_information.h"
#include "gromacs/utility/logger.h"
#include "gromacs/utility/stringutil.h"
#include "gpu_utils.h"
+ Include dependency graph for gpu_utils_hip.cpp:

Description

Define functions for detection and initialization for HIP devices.

Author
Paul Bauer paul..nosp@m.baue.nosp@m.r.q@g.nosp@m.mail.nosp@m..com

Functions

bool isHostMemoryPinned (const void *h_ptr)
 Tells whether the host buffer was pinned for non-blocking transfers. Only implemented for CUDA.
 
void startGpuProfiler ()
 Starts the GPU profiler if mdrun is being profiled. More...
 
void stopGpuProfiler ()
 Stops the CUDA profiler if mdrun is being profiled. More...
 
void resetGpuProfiler ()
 Resets the GPU profiler if mdrun is being profiled. More...
 
static void peerAccessCheckStat (const hipError_t stat, const int gpuA, const int gpuB, const gmx::MDLogger &mdlog, const char *hipCallName)
 Check and act on status returned from peer access HIP call. More...
 
void setupGpuDevicePeerAccess (gmx::ArrayRef< const int > gpuIdsToUse, const gmx::MDLogger &mdlog)
 Enable peer access between GPUs where supported. More...
 
void checkPendingDeviceErrorBetweenSteps ()
 Check for API errors to avoid propagating these across e.g. MD steps.
 

Function Documentation

static void peerAccessCheckStat ( const hipError_t  stat,
const int  gpuA,
const int  gpuB,
const gmx::MDLogger mdlog,
const char *  hipCallName 
)
static

Check and act on status returned from peer access HIP call.

If status is "hipSuccess", we continue. If "hipErrorPeerAccessAlreadyEnabled", then peer access has already been enabled so we ignore. If "hipErrorInvalidDevice" then the run is trying to access an invalid GPU, so we throw an error. If "hipErrorInvalidValue" then there is a problem with the arguments to the HIP call, and we throw an error. These cover all expected statuses, but if any other is returned we issue a warning and continue.

Parameters
[in]statHIP call return status
[in]gpuAID for GPU initiating peer access call
[in]gpuBID for remote GPU
[in]mdlogLogger object
[in]hipCallNamename of HIP peer access call
void resetGpuProfiler ( )

Resets the GPU profiler if mdrun is being profiled.

When a profiler run is in progress (based on the presence of the NVPROF_ID env. var.), the profiler data is restet in order to eliminate the data collected from the preceding part fo the run.

This function should typically be called at the mdrun counter reset time.

Note that this is implemented only for the CUDA API.

void setupGpuDevicePeerAccess ( gmx::ArrayRef< const int >  gpuIdsToUse,
const gmx::MDLogger mdlog 
)

Enable peer access between GPUs where supported.

Parameters
[in]gpuIdsToUseList of GPU IDs in use
[in]mdlogLogger object
void startGpuProfiler ( )

Starts the GPU profiler if mdrun is being profiled.

When a profiler run is in progress (based on the presence of the NVPROF_ID env. var.), the profiler is started to begin collecting data during the rest of the run (or until stopGpuProfiler is called).

Note that this is implemented only for the CUDA API.

void stopGpuProfiler ( )

Stops the CUDA profiler if mdrun is being profiled.

This function can be called at cleanup when skipping recording recording subsequent API calls from being traces/profiled is desired, e.g. before uninitialization.

Note that this is implemented only for the CUDA API.