Gromacs
2018.8
|
This file declares functions for timing the load imbalance due to domain decomposition.
Enumerations | |
enum | DdOpenBalanceRegionBeforeForceComputation { DdOpenBalanceRegionBeforeForceComputation::no, DdOpenBalanceRegionBeforeForceComputation::yes } |
Tells if we should open the balancing region. More... | |
enum | DdCloseBalanceRegionAfterForceComputation { DdCloseBalanceRegionAfterForceComputation::no, DdCloseBalanceRegionAfterForceComputation::yes } |
Tells if we should close the balancing region after the force computation has completed. More... | |
enum | DdAllowBalanceRegionReopen { DdAllowBalanceRegionReopen::no, DdAllowBalanceRegionReopen::yes } |
Tells if we should open the balancing region. More... | |
enum | DdBalanceRegionWaitedForGpu { DdBalanceRegionWaitedForGpu::no, DdBalanceRegionWaitedForGpu::yes } |
Tells if we had to wait for a GPU to finish computation. More... | |
Functions | |
BalanceRegion * | ddBalanceRegionAllocate () |
Returns a pointer to a constructed BalanceRegion struct. More... | |
void | ddOpenBalanceRegionCpu (const gmx_domdec_t *dd, DdAllowBalanceRegionReopen allowReopen) |
Open the load balance timing region on the CPU. More... | |
void | ddOpenBalanceRegionGpu (const gmx_domdec_t *dd) |
Open the load balance timing region for the CPU. More... | |
void | ddReopenBalanceRegionCpu (const gmx_domdec_t *dd) |
Re-open the, already opened, load balance timing region. More... | |
void | ddCloseBalanceRegionCpu (const gmx_domdec_t *dd) |
Close the load balance timing region on the CPU side. More... | |
void | ddCloseBalanceRegionGpu (const gmx_domdec_t *dd, float waitCyclesGpuInCpuRegion, DdBalanceRegionWaitedForGpu waitedForGpu) |
Close the load balance timing region on the GPU side. More... | |
|
strong |
|
strong |
|
strong |
|
strong |
BalanceRegion* ddBalanceRegionAllocate | ( | ) |
Returns a pointer to a constructed BalanceRegion
struct.
Should be replaced by a proper constructor once BalanceRegion is a proper class (requires restructering in domdec.cpp).
void ddCloseBalanceRegionCpu | ( | const gmx_domdec_t * | dd | ) |
Close the load balance timing region on the CPU side.
[in,out] | dd | The domain decomposition struct |
void ddCloseBalanceRegionGpu | ( | const gmx_domdec_t * | dd, |
float | waitCyclesGpuInCpuRegion, | ||
DdBalanceRegionWaitedForGpu | waitedForGpu | ||
) |
Close the load balance timing region on the GPU side.
This should be called after the CPU receives the last (local) results from the GPU. The wait time for these results is estimated, depending on the waitedForGpu
parameter. If called on an already closed region, this call does nothing.
[in,out] | dd | The domain decomposition struct |
[in] | waitCyclesGpuInCpuRegion | The time we waited for the GPU earlier, overlapping completely with the open CPU region |
[in] | waitedForGpu | Tells if we waited for the GPU to finish now |
void ddOpenBalanceRegionCpu | ( | const gmx_domdec_t * | dd, |
DdAllowBalanceRegionReopen | allowReopen | ||
) |
Open the load balance timing region on the CPU.
Opens the balancing region for timing how much time it takes to perform the (balancable part of) the MD step. This should be called right after the last communication during the previous step to maximize the region. In practice this means right after the force communication finished or just before neighbor search at search steps. It is assumed that computation done in the region either scales along with the domain size or takes constant time.
[in,out] | dd | The domain decomposition struct |
[in] | allowReopen | Allows calling with a potentially already opened region |
void ddOpenBalanceRegionGpu | ( | const gmx_domdec_t * | dd | ) |
Open the load balance timing region for the CPU.
This can only be called within a region that is open on the CPU side.
void ddReopenBalanceRegionCpu | ( | const gmx_domdec_t * | dd | ) |
Re-open the, already opened, load balance timing region.
This function should be called after every MPI communication that occurs in the main MD loop. Note that the current setup assumes that all MPI communication acts like a global barrier. But if some ranks don't participate in communication or if some ranks communicate faster with neighbors than others, the obtained timings might not accurately reflect the computation time.
[in,out] | dd | The domain decomposition struct |