Gromacs
2026.0-dev-20241204-d69d709
|
#include <cstddef>
#include <array>
#include <memory>
#include <optional>
#include <string>
#include <vector>
#include "gromacs/utility/basedefinitions.h"
#include "gromacs/utility/iserializer.h"
Declares functions to manage GPU resources.
This has several implementations: one for each supported GPU platform, and a stub implementation if the build does not support GPUs.
Classes | |
class | gmx::ArrayRef< typename > |
STL-like interface to a C array of T (or part of a std container of T). More... | |
Functions | |
void | warnWhenDeviceNotTargeted (const gmx::MDLogger &mdlog, const DeviceInformation &deviceInfo) |
Warn to the logger when the detected device was not one of the targets selected at configure time for compilation. More... | |
bool | canPerformDeviceDetection (std::string *errorMessage) |
Return whether GPUs can be detected. More... | |
bool | isDeviceDetectionEnabled () |
Return whether GPU detection is enabled. More... | |
bool | isDeviceDetectionFunctional (std::string *errorMessage) |
Return whether GPU detection is functioning correctly. More... | |
DeviceVendor | getDeviceVendor (const char *vendorName) |
Returns an DeviceVendor value corresponding to the input OpenCL vendor name. More... | |
int | getDeviceComputeUnitFactor (const DeviceInformation &deviceInfo) |
Get the factor to divide the number of compute units by. More... | |
std::vector< std::unique_ptr < DeviceInformation > > | findDevices () |
Find all GPUs in the system. More... | |
std::vector < std::reference_wrapper < DeviceInformation > > | getCompatibleDevices (const std::vector< std::unique_ptr< DeviceInformation >> &deviceInfoList) |
Return a container of device-information handles that are compatible. More... | |
std::vector< int > | getCompatibleDeviceIds (gmx::ArrayRef< const std::unique_ptr< DeviceInformation >> deviceInfoList) |
Return a container of the IDs of the compatible GPU ids. More... | |
bool | deviceIdIsCompatible (gmx::ArrayRef< const std::unique_ptr< DeviceInformation >> deviceInfoList, int deviceId) |
Return whether deviceId is found in deviceInfoList and is compatible. More... | |
gmx::GpuAwareMpiStatus | getMinimalSupportedGpuAwareMpiStatus (gmx::ArrayRef< const std::unique_ptr< DeviceInformation >> deviceInfoList) |
Return whether all compatible devices in deviceInfoList support GPU-aware MPI. More... | |
void | setActiveDevice (const DeviceInformation &deviceInfo) |
Set the active GPU. More... | |
void | releaseDevice () |
Releases the GPU device used by the active context at the time of calling. More... | |
std::string | getDeviceInformationString (const DeviceInformation &deviceInfo) |
Formats and returns a device information string for a given GPU. More... | |
std::string | getDeviceCompatibilityDescription (gmx::ArrayRef< const std::unique_ptr< DeviceInformation >> deviceInfoList, int deviceId) |
Return a string describing how compatible the GPU with given deviceId is. More... | |
void | serializeDeviceInformations (const std::vector< std::unique_ptr< DeviceInformation >> &deviceInfoList, gmx::ISerializer *serializer) |
Serialization of information on devices for MPI broadcasting. More... | |
std::vector< std::unique_ptr < DeviceInformation > > | deserializeDeviceInformations (gmx::ISerializer *serializer) |
Deserialization of information on devices after MPI broadcasting. More... | |
int | uniqueDeviceId (const DeviceInformation &deviceInfo) |
Return an ID (non-negative integer) for the described GPU that may be unique. More... | |
std::optional< std::array < std::byte, 16 > > | uuidForDevice (const DeviceInformation &deviceInfo) |
Return the optional UUID detected for the indicated device. | |
void | doubleCheckGpuAwareMpiWillWork (const DeviceInformation &deviceInfo) |
bool canPerformDeviceDetection | ( | std::string * | errorMessage | ) |
Return whether GPUs can be detected.
Returns true when this is a build of GROMACS configured to support GPU usage, GPU detection is not disabled by GMX_DISABLE_GPU_DETECTION
environment variable and a valid device driver, ICD, and/or runtime was detected. Does not throw.
[out] | errorMessage | When returning false on a build configured with GPU support and non-nullptr was passed, the string contains a descriptive message about why GPUs cannot be detected. |
std::vector<std::unique_ptr<DeviceInformation> > deserializeDeviceInformations | ( | gmx::ISerializer * | serializer | ) |
Deserialization of information on devices after MPI broadcasting.
[in] | serializer | Serializing object. |
bool deviceIdIsCompatible | ( | gmx::ArrayRef< const std::unique_ptr< DeviceInformation >> | deviceInfoList, |
int | deviceId | ||
) |
Return whether deviceId
is found in deviceInfoList
and is compatible.
This function filters the result of the detection for compatible GPUs, based on the previously run compatibility tests.
[in] | deviceInfoList | An information on available devices. |
[in] | deviceId | The device ID to find in the list. |
RangeError | If deviceId does not match the id of any device in deviceInfoList |
deviceId
is compatible. void doubleCheckGpuAwareMpiWillWork | ( | const DeviceInformation & | deviceInfo | ) |
Run a possible check that GPU-aware MPI will work on deviceInfo
InvalidInputError | if the user's choices would lead to a crash |
std::vector<std::unique_ptr<DeviceInformation> > findDevices | ( | ) |
Find all GPUs in the system.
Will detect every GPU supported by the device driver in use. Must only be called if canPerformDeviceDetection()
has returned true. This routine also checks for the compatibility of each device and fill the deviceInfo array with the required information on each device: ID, device properties, status.
Note that this function leaves the GPU runtime API error state clean; this is implemented ATM in the CUDA flavor. This invalidates any existing CUDA streams, allocated memory on GPU, etc.
InternalError | if a GPU API returns an unexpected failure (because the call to canDetectGpus() should always prevent this occuring) |
std::vector<int> getCompatibleDeviceIds | ( | gmx::ArrayRef< const std::unique_ptr< DeviceInformation >> | deviceInfoList | ) |
Return a container of the IDs of the compatible GPU ids.
This function filters the result of the detection for compatible GPUs, based on the previously run compatibility tests.
[in] | deviceInfoList | An information on available devices. |
std::vector<std::reference_wrapper<DeviceInformation> > getCompatibleDevices | ( | const std::vector< std::unique_ptr< DeviceInformation >> & | deviceInfoList | ) |
Return a container of device-information handles that are compatible.
This function filters the result of the detection for compatible GPUs, based on the previously run compatibility tests.
[in] | deviceInfoList | An information on available devices. |
std::string getDeviceCompatibilityDescription | ( | gmx::ArrayRef< const std::unique_ptr< DeviceInformation >> | deviceInfoList, |
int | deviceId | ||
) |
Return a string describing how compatible the GPU with given deviceId
is.
[in] | deviceInfoList | An information on available devices. |
[in] | deviceId | An index of the device to check |
int getDeviceComputeUnitFactor | ( | const DeviceInformation & | deviceInfo | ) |
Get the factor to divide the number of compute units by.
OpenCL and SYCL can report the number of Compute Units (CUs) a device has, see CL_DEVICE_MAX_COMPUTE_UNITS
and info::device::max_compute_units
. But "CU" is only vaguely defined by the standard, and on different vendors the same API call returns different things.
On NVIDIA, that is the number of SMs.
On AMD, that is the number of Compute Units, which are similar to CUDA's SM. Except on RDNA, where the number of Dual Compute Units is returned (https://stackoverflow.com/a/63976796/929437).
On Intel, that is the number of EUs (XVEs), which are similar to CUDA core. The concept similar to CUDA SM is called sub-slice (Xe Core, XC), and it contains 16 EUs (Gen9-Gen11, Xe).
This function uses CUDA SM as a reference. To get the number of SM-like units on a device, divide the result of CL_DEVICE_MAX_COMPUTE_UNITS
/ info::device::max_compute_units
API call by the value returned by this function.
[in] | deviceInfo | Device information. |
std::string getDeviceInformationString | ( | const DeviceInformation & | deviceInfo | ) |
Formats and returns a device information string for a given GPU.
Given an index directly into the array of available GPUs, returns a formatted info string for the respective GPU which includes ID, name, compute capability, and detection status.
[in] | deviceInfo | An information on device that is to be set. |
DeviceVendor getDeviceVendor | ( | const char * | vendorName | ) |
Returns an DeviceVendor value corresponding to the input OpenCL vendor name.
gmx::GpuAwareMpiStatus getMinimalSupportedGpuAwareMpiStatus | ( | gmx::ArrayRef< const std::unique_ptr< DeviceInformation >> | deviceInfoList | ) |
Return whether all compatible devices in deviceInfoList
support GPU-aware MPI.
bool isDeviceDetectionEnabled | ( | ) |
Return whether GPU detection is enabled.
Returns true when this is a build of GROMACS configured to support GPU usage and GPU detection is not disabled by GMX_DISABLE_GPU_DETECTION
environment variable.
Does not throw.
bool isDeviceDetectionFunctional | ( | std::string * | errorMessage | ) |
Return whether GPU detection is functioning correctly.
Returns true when this is a build of GROMACS configured to support GPU usage, and a valid device driver, ICD, and/or runtime was detected.
This function is not intended to be called from build configurations that do not support GPUs, and there will be no descriptive message in that case.
[out] | errorMessage | When returning false on a build configured with GPU support and non-nullptr was passed, the string contains a descriptive message about why GPUs cannot be detected. |
Does not throw.
void releaseDevice | ( | ) |
Releases the GPU device used by the active context at the time of calling.
With CUDA, the device is reset and therefore all data uploaded to the GPU is lost. This must only be called when none of this data is required anymore, because subsequent attempts to free memory associated with the context will otherwise fail. Calls gmx_warning
upon errors.
With other GPU SDKs, does nothing.
Should only be called after setActiveDevice
was called.
void serializeDeviceInformations | ( | const std::vector< std::unique_ptr< DeviceInformation >> & | deviceInfoList, |
gmx::ISerializer * | serializer | ||
) |
Serialization of information on devices for MPI broadcasting.
[in] | deviceInfoList | The vector with device informations to serialize. |
[in] | serializer | Serializing object. |
void setActiveDevice | ( | const DeviceInformation & | deviceInfo | ) |
Set the active GPU.
This sets the device for which the device information is passed active. Essential in CUDA, where the device buffers and kernel launches are not connected to the device context. In OpenCL, checks the device vendor and makes vendor-specific performance adjustments.
[in] | deviceInfo | Information on the device to be set. |
Issues a fatal error for any critical errors that occur during initialization.
int uniqueDeviceId | ( | const DeviceInformation & | deviceInfo | ) |
Return an ID (non-negative integer) for the described GPU that may be unique.
If a UUID is available, returns its hash. Otherwise, returns DeviceInformation::id
field.
Note that the value used on different ranks may or may not be a reliable indicator of whether the ranks share devices, depending how that id was constructed, perhaps depending on what devices were visible to different ranks.
void warnWhenDeviceNotTargeted | ( | const gmx::MDLogger & | mdlog, |
const DeviceInformation & | deviceInfo | ||
) |
Warn to the logger when the detected device was not one of the targets selected at configure time for compilation.
[in] | mdlog | Logger |
[in] | deviceInfo | The device to potentially warn about |