NVSHMEM-based fused GPU halo exchange.
Aggregates all pulses into a single structure for a fused GPU-initiated exchange path using NVSHMEM. Combining coordinate (X) and force (F) metadata is a by-product of this aggregation. The class:
- Builds per-dimension/pulse metadata and aggregates entries into one struct
- Allocates unified device buffers for send/recv regions and computes per-pulse offsets aligned to c_haloEntryAlignBytes for coalesced access and high-throughput transfers (e.g., NVSHMEM device puts/gets, CUDA vector/TMA async copies)
- Exposes device pointers per entry for pack/unpack kernels
- Manages NVSHMEM signal buffers via a single symmetric allocation and pointer-offset views (see implementation for exact layout)
- Reduces MPI collectives: unified send/recv sizes enable a single MPI_Allreduce for max sizing instead of per-pulse max-reductions
- Unified buffer sizes are max-reduced across ranks to guarantee identical sizes
|
| | FusedGpuHaloExchange (const DeviceContext &deviceContext, gmx_wallcycle *wcycle, MPI_Comm mpi_comm_mysim, MPI_Comm mpi_comm_mysim_world) |
| | Creates NVSHMEM-based fused GPU halo exchange object. More...
|
| |
|
| ~FusedGpuHaloExchange () |
| | Destructor.
|
| |
| GpuEventSynchronizer * | launchAllCoordinateExchanges (const matrix box, GpuEventSynchronizer *dependencyEvent) |
| | Launch fused coordinate (X) exchanges across all pulses. More...
|
| |
| GpuEventSynchronizer * | launchAllForceExchanges (bool accumulateForces, FixedCapacityVector< GpuEventSynchronizer *, 2 > *dependencyEvents) |
| | Launch fused force (F) exchanges across all pulses. More...
|
| |
| void | reinitAllHaloExchanges (const t_commrec &cr, DeviceBuffer< RVec > d_coordinatesBuffer, DeviceBuffer< RVec > d_forcesBuffer, DeviceBuffer< uint64_t > d_syncBase, int totalNumPulses) |
| | (Re-)initialize fused halo exchanges for all dimensions and pulses. Builds per-pulse entries, sets shared buffers, NVSHMEM signals, and prepares metadata. More...
|
| |
|
void | destroyAllHaloExchangeBuffers () |
| | Destroy unified/symmetric buffers used by fused halo exchange (if any).
|
| |
|
void | allocateAndCopyHaloExchangeData () |
| | Copy per-pulse metadata to device.
|
| |
| GpuEventSynchronizer * | getForcesReadyOnDeviceEvent () |
| | Get the event synchronizer for the fused forces ready on device. More...
|
| |
| void | allocateUnifiedHaloBuffers () |
| | Allocate unified send/recv buffers and point entries into them. Uses max-reduced per-pulse extents to size symmetric buffers and assigns aligned per-entry pointers into those buffers. More...
|
| |