Gromacs  2025-dev-20241002-88a4191
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
Functions
device_management_sycl.cpp File Reference
#include "gmxpre.h"
#include "config.h"
#include <algorithm>
#include <map>
#include <optional>
#include <tuple>
#include <vector>
#include "gromacs/gpu_utils/gmxsycl.h"
#include "gromacs/hardware/device_management.h"
#include "gromacs/hardware/device_management_sycl_intel_device_ids.h"
#include "gromacs/utility/arrayref.h"
#include "gromacs/utility/exceptions.h"
#include "gromacs/utility/fatalerror.h"
#include "gromacs/utility/mpiinfo.h"
#include "gromacs/utility/strconvert.h"
#include "gromacs/utility/stringutil.h"
#include "gromacs/utility/unique_cptr.h"
#include "device_information.h"
+ Include dependency graph for device_management_sycl.cpp:

Description

Defines the SYCL implementations of the device management.

Author
Paul Bauer paul..nosp@m.baue.nosp@m.r.q@g.nosp@m.mail.nosp@m..com
Erik Lindahl erik..nosp@m.lind.nosp@m.ahl@g.nosp@m.mail.nosp@m..com
Artem Zhmurov zhmur.nosp@m.ov@g.nosp@m.mail..nosp@m.com
Andrey Alekseenko al42a.nosp@m.nd@g.nosp@m.mail..nosp@m.com

Functions

static std::optional
< std::tuple< int, int > > 
parseHardwareVersionNvidia (const std::string &archName)
 
static std::optional
< std::tuple< int, int > > 
getHardwareVersionNvidia (const sycl::device &device)
 
static std::optional
< std::tuple< int, int, int > > 
parseHardwareVersionAmd (const std::string &archName)
 
static std::optional
< std::tuple< int, int, int > > 
getHardwareVersionAmd (const sycl::device &device)
 
static std::optional
< std::tuple< int, int, int > > 
getHardwareVersionIntel (const sycl::device &device)
 
static gmx::GpuAwareMpiStatus getDeviceGpuAwareMpiStatus (const sycl::backend backend)
 
void warnWhenDeviceNotTargeted (const gmx::MDLogger &, const DeviceInformation &)
 Warn to the logger when the detected device was not one of the targets selected at configure time for compilation. More...
 
bool isDeviceDetectionFunctional (std::string *errorMessage)
 Return whether GPU detection is functioning correctly. More...
 
static DeviceStatus isDeviceCompatible (const sycl::device &syclDevice, const DeviceVendor gmx_unused deviceVendor, gmx::ArrayRef< const int > supportedSubGroupSizes)
 Checks that device deviceInfo is compatible with GROMACS. More...
 
static bool isDeviceFunctional (const sycl::device &syclDevice, std::string *errorMessage)
 Checks that device deviceInfo is sane (ie can run a kernel). More...
 
static DeviceStatus checkDevice (size_t deviceId, const DeviceInformation &deviceInfo)
 Checks that device deviceInfo is compatible and functioning. More...
 
static std::vector< sycl::device > partitionDevices (const std::vector< sycl::device > &&devices)
 
std::vector< std::unique_ptr
< DeviceInformation > > 
findDevices ()
 Find all GPUs in the system. More...
 
void setActiveDevice (const DeviceInformation &)
 Set the active GPU. More...
 
void releaseDevice ()
 Releases the GPU device used by the active context at the time of calling. More...
 
std::string getDeviceInformationString (const DeviceInformation &deviceInfo)
 Formats and returns a device information string for a given GPU. More...
 

Function Documentation

static DeviceStatus checkDevice ( size_t  deviceId,
const DeviceInformation deviceInfo 
)
static

Checks that device deviceInfo is compatible and functioning.

Checks the given SYCL device for compatibility and runs a dummy kernel on it to determine whether the device functions properly.

Parameters
[in]deviceIdDevice number (internal to GROMACS).
[in]deviceInfoThe device info pointer.
Returns
The status of device.
std::vector<std::unique_ptr<DeviceInformation> > findDevices ( )

Find all GPUs in the system.

Will detect every GPU supported by the device driver in use. Must only be called if canPerformDeviceDetection() has returned true. This routine also checks for the compatibility of each device and fill the deviceInfo array with the required information on each device: ID, device properties, status.

Note that this function leaves the GPU runtime API error state clean; this is implemented ATM in the CUDA flavor. This invalidates any existing CUDA streams, allocated memory on GPU, etc.

Todo:
: Check if errors do propagate in OpenCL as they do in CUDA and whether there is a mechanism to "clear" them.
Returns
Standard vector with the list of devices found
Exceptions
InternalErrorif a GPU API returns an unexpected failure (because the call to canDetectGpus() should always prevent this occuring)
std::string getDeviceInformationString ( const DeviceInformation deviceInfo)

Formats and returns a device information string for a given GPU.

Given an index directly into the array of available GPUs, returns a formatted info string for the respective GPU which includes ID, name, compute capability, and detection status.

Parameters
[in]deviceInfoAn information on device that is to be set.
Returns
A string describing the device.
static DeviceStatus isDeviceCompatible ( const sycl::device &  syclDevice,
const DeviceVendor gmx_unused  deviceVendor,
gmx::ArrayRef< const int >  supportedSubGroupSizes 
)
static

Checks that device deviceInfo is compatible with GROMACS.

Parameters
[in]syclDeviceSYCL device handle.
[in]deviceVendorDevice vendor.
[in]supportedSubGroupSizesList of supported sub-group sizes as reported by the device.
Returns
The status enumeration value for the checked device.
bool isDeviceDetectionFunctional ( std::string *  errorMessage)

Return whether GPU detection is functioning correctly.

Returns true when this is a build of GROMACS configured to support GPU usage, and a valid device driver, ICD, and/or runtime was detected.

This function is not intended to be called from build configurations that do not support GPUs, and there will be no descriptive message in that case.

Parameters
[out]errorMessageWhen returning false on a build configured with GPU support and non-nullptr was passed, the string contains a descriptive message about why GPUs cannot be detected.

Does not throw.

static bool isDeviceFunctional ( const sycl::device &  syclDevice,
std::string *  errorMessage 
)
static

Checks that device deviceInfo is sane (ie can run a kernel).

Compiles and runs a dummy kernel to determine whether the given SYCL device functions properly.

Parameters
[in]syclDeviceThe device info pointer.
[out]errorMessageAn error message related to a SYCL error.
Exceptions
std::bad_allocWhen out of memory.
Returns
Whether the device passed sanity checks
void releaseDevice ( )

Releases the GPU device used by the active context at the time of calling.

With CUDA, the device is reset and therefore all data uploaded to the GPU is lost. This must only be called when none of this data is required anymore, because subsequent attempts to free memory associated with the context will otherwise fail. Calls gmx_warning upon errors.

With other GPU SDKs, does nothing.

Should only be called after setActiveDevice was called.

void setActiveDevice ( const DeviceInformation deviceInfo)

Set the active GPU.

This sets the device for which the device information is passed active. Essential in CUDA, where the device buffers and kernel launches are not connected to the device context. In OpenCL, checks the device vendor and makes vendor-specific performance adjustments.

Parameters
[in]deviceInfoInformation on the device to be set.

Issues a fatal error for any critical errors that occur during initialization.

void warnWhenDeviceNotTargeted ( const gmx::MDLogger mdlog,
const DeviceInformation deviceInfo 
)

Warn to the logger when the detected device was not one of the targets selected at configure time for compilation.

Parameters
[in]mdlogLogger
[in]deviceInfoThe device to potentially warn about