Gromacs
2018.8
|
#include "config.h"
#include <cassert>
#include <cstddef>
#include <cstdint>
#include <algorithm>
#include "impl_reference_definitions.h"
#include "impl_reference_simd_float.h"
Reference impl., higher-level single prec. SIMD utility functions.
Functions | |
template<int align> | |
static void gmx_simdcall | gmx::gatherLoadTranspose (const float *base, const std::int32_t offset[], SimdFloat *v0, SimdFloat *v1, SimdFloat *v2, SimdFloat *v3) |
Load 4 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets, and transpose into 4 SIMD float variables. More... | |
template<int align> | |
static void gmx_simdcall | gmx::gatherLoadTranspose (const float *base, const std::int32_t offset[], SimdFloat *v0, SimdFloat *v1) |
Load 2 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets, and transpose into 2 SIMD float variables. More... | |
template<int align> | |
static void gmx_simdcall | gmx::gatherLoadUTranspose (const float *base, const std::int32_t offset[], SimdFloat *v0, SimdFloat *v1, SimdFloat *v2) |
Load 3 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets, and transpose into 3 SIMD float variables. More... | |
template<int align> | |
static void gmx_simdcall | gmx::transposeScatterStoreU (float *base, const std::int32_t offset[], SimdFloat v0, SimdFloat v1, SimdFloat v2) |
Transpose and store 3 SIMD floats to 3 consecutive addresses at GMX_SIMD_FLOAT_WIDTH offsets. More... | |
template<int align> | |
static void gmx_simdcall | gmx::transposeScatterIncrU (float *base, const std::int32_t offset[], SimdFloat v0, SimdFloat v1, SimdFloat v2) |
Transpose and add 3 SIMD floats to 3 consecutive addresses at GMX_SIMD_FLOAT_WIDTH offsets. More... | |
template<int align> | |
static void gmx_simdcall | gmx::transposeScatterDecrU (float *base, const std::int32_t offset[], SimdFloat v0, SimdFloat v1, SimdFloat v2) |
Transpose and subtract 3 SIMD floats to 3 consecutive addresses at GMX_SIMD_FLOAT_WIDTH offsets. More... | |
static void gmx_simdcall | gmx::expandScalarsToTriplets (SimdFloat scalar, SimdFloat *triplets0, SimdFloat *triplets1, SimdFloat *triplets2) |
Expand each element of float SIMD variable into three identical consecutive elements in three SIMD outputs. More... | |
template<int align> | |
static void gmx_simdcall | gmx::gatherLoadBySimdIntTranspose (const float *base, SimdFInt32 offset, SimdFloat *v0, SimdFloat *v1, SimdFloat *v2, SimdFloat *v3) |
Load 4 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets specified by a SIMD integer, transpose into 4 SIMD float variables. More... | |
template<int align> | |
static void gmx_simdcall | gmx::gatherLoadUBySimdIntTranspose (const float *base, SimdFInt32 offset, SimdFloat *v0, SimdFloat *v1) |
Load 2 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets (unaligned) specified by SIMD integer, transpose into 2 SIMD floats. More... | |
template<int align> | |
static void gmx_simdcall | gmx::gatherLoadBySimdIntTranspose (const float *base, SimdFInt32 offset, SimdFloat *v0, SimdFloat *v1) |
Load 2 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets specified by a SIMD integer, transpose into 2 SIMD float variables. More... | |
static float gmx_simdcall | gmx::reduceIncr4ReturnSum (float *m, SimdFloat v0, SimdFloat v1, SimdFloat v2, SimdFloat v3) |
Reduce each of four SIMD floats, add those values to four consecutive floats in memory, return sum. More... | |
Higher-level SIMD utilities accessing partial (half-width) SIMD floats. | |
These functions are optional. The are only useful for SIMD implementation where the width is 8 or larger, and where it would be inefficient to process 4*8, 8*8, or more, interactions in parallel. Currently, only Intel provides very wide SIMD implementations, but these also come with excellent support for loading, storing, accessing and shuffling parts of the register in so-called 'lanes' of 4 bytes each. We can use this to load separate parts into the low/high halves of the register in the inner loop of the nonbonded kernel, which e.g. makes it possible to process 4*4 nonbonded interactions as a pattern of 2*8. We can also use implementations with width 16 or greater. To make this more generic, when GMX_SIMD_HAVE_HSIMD_UTIL_REAL is 1, the SIMD implementation provides seven special routines that:
Remember: this is ONLY used when the native SIMD width is large. You will just waste time if you implement it for normal 16-byte SIMD architectures. This is part of the new C++ SIMD interface, so these functions are only available when using C++. Since some Gromacs code reliying on the SIMD module is still C (not C++), we have kept the C-style naming for now - this will change once we are entirely C++. | |
static SimdFloat gmx_simdcall | gmx::loadDualHsimd (const float *m0, const float *m1) |
Load low & high parts of SIMD float from different locations. More... | |
static SimdFloat gmx_simdcall | gmx::loadDuplicateHsimd (const float *m) |
Load half-SIMD-width float data, spread to both halves. More... | |
static SimdFloat gmx_simdcall | gmx::loadU1DualHsimd (const float *m) |
Load two floats, spread 1st in low half, 2nd in high half. More... | |
static void gmx_simdcall | gmx::storeDualHsimd (float *m0, float *m1, SimdFloat a) |
Store low & high parts of SIMD float to different locations. More... | |
static void gmx_simdcall | gmx::incrDualHsimd (float *m0, float *m1, SimdFloat a) |
Add each half of SIMD variable to separate memory adresses. More... | |
static void gmx_simdcall | gmx::decrHsimd (float *m, SimdFloat a) |
Add the two halves of a SIMD float, subtract the sum from half-SIMD-width consecutive floats in memory. More... | |
template<int align> | |
static void gmx_simdcall | gmx::gatherLoadTransposeHsimd (const float *base0, const float *base1, const std::int32_t offset[], SimdFloat *v0, SimdFloat *v1) |
Load 2 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH/2 offsets, transpose into SIMD float (low half from base0, high from base1). More... | |
static float gmx_simdcall | gmx::reduceIncr4ReturnSumHsimd (float *m, SimdFloat v0, SimdFloat v1) |
Reduce the 4 half-SIMD-with floats in 2 SIMD variables (sum halves), increment four consecutive floats in memory, return sum. More... | |
static SimdFloat gmx_simdcall | gmx::loadUNDuplicate4 (const float *m) |
Load N floats and duplicate them 4 times each. More... | |
static SimdFloat gmx_simdcall | gmx::load4DuplicateN (const float *m) |
Load 4 floats and duplicate them N times each. More... | |
static SimdFloat gmx_simdcall | gmx::loadU4NOffset (const float *m, int offset) |
Load floats in blocks of 4 at fixed offsets. More... | |
Variables | |
static const int | gmx::c_simdBestPairAlignmentFloat = 2 |
Best alignment to use for aligned pairs of float data. More... | |