#include "gmxpre.h"
#include <cstdlib>
#include <hip/hip_profile.h>
#include "gromacs/gpu_utils/hiputils.h"
#include "gromacs/hardware/device_management.h"
#include "gromacs/utility/logger.h"
#include "gromacs/utility/stringutil.h"
#include "gpu_utils.h"

Include dependency graph for gpu_utils_hip.cpp:

Description

Define functions for detection and initialization for HIP devices.

Author: Paul Bauer paul..nosp@m.baue.nosp@m.r.q@g.nosp@m.mail.nosp@m..com; Julio Maia julio.nosp@m..mai.nosp@m.a@amd.nosp@m..com

Functions
bool	isHostMemoryPinned (const void *h_ptr)
	Tells whether the host buffer was pinned for non-blocking transfers. Only implemented for CUDA.

void	startGpuProfiler ()
	Starts the GPU profiler if mdrun is being profiled. More...

void	stopGpuProfiler ()
	Stops the CUDA profiler if mdrun is being profiled. More...

void	resetGpuProfiler ()
	Resets the GPU profiler if mdrun is being profiled. More...

static void	peerAccessCheckStat (const hipError_t stat, const int gpuA, const int gpuB, const gmx::MDLogger &mdlog, const char *hipCallName)
	Check and act on status returned from peer access HIP call. More...

void	setupGpuDevicePeerAccess (gmx::ArrayRef< const int > gpuIdsToUse, const gmx::MDLogger &mdlog)
	Enable peer access between GPUs where supported. More...

void	checkPendingDeviceErrorBetweenSteps ()
	Check for API errors to avoid propagating these across e.g. MD steps.

Function Documentation

static void peerAccessCheckStat	(	const hipError_t	stat,
		const int	gpuA,
		const int	gpuB,
		const gmx::MDLogger &	mdlog,
		const char *	hipCallName
	)

static

Check and act on status returned from peer access HIP call.

If status is "hipSuccess", we continue. If "hipErrorPeerAccessAlreadyEnabled", then peer access has already been enabled so we ignore. If "hipErrorInvalidDevice" then the run is trying to access an invalid GPU, so we throw an error. If "hipErrorInvalidValue" then there is a problem with the arguments to the HIP call, and we throw an error. These cover all expected statuses, but if any other is returned we issue a warning and continue.

Parameters

[in]	stat	HIP call return status
[in]	gpuA	ID for GPU initiating peer access call
[in]	gpuB	ID for remote GPU
[in]	mdlog	Logger object
[in]	hipCallName	name of HIP peer access call

void resetGpuProfiler ( )

Resets the GPU profiler if mdrun is being profiled.

When a profiler run is in progress (based on the presence of the NVPROF_ID env. var.), the profiler data is restet in order to eliminate the data collected from the preceding part fo the run.

This function should typically be called at the mdrun counter reset time.

Note that this is implemented only for the CUDA API.

void setupGpuDevicePeerAccess	(	gmx::ArrayRef< const int >	gpuIdsToUse,
		const gmx::MDLogger &	mdlog
	)

Enable peer access between GPUs where supported.

Parameters

[in]	gpuIdsToUse	List of GPU IDs in use
[in]	mdlog	Logger object

void startGpuProfiler ( )

Starts the GPU profiler if mdrun is being profiled.

When a profiler run is in progress (based on the presence of the NVPROF_ID env. var.), the profiler is started to begin collecting data during the rest of the run (or until stopGpuProfiler is called).

Note that this is implemented only for the CUDA API.

void stopGpuProfiler ( )

Stops the CUDA profiler if mdrun is being profiled.

This function can be called at cleanup when skipping recording recording subsequent API calls from being traces/profiled is desired, e.g. before uninitialization.

Note that this is implemented only for the CUDA API.

Description

Functions

Function Documentation