Gromacs
2023
|
Provides an architecture-independent way of doing SIMD coding.
Overview of the SIMD implementation is provided in Single-instruction Multiple-data (SIMD) coding. The details are documented in gromacs/simd/simd.h and the reference implementation impl_reference.h.
Namespaces | |
gmx | |
Generic GROMACS namespace. | |
Constant width-4 double precision SIMD types and instructions | |
static Simd4Double gmx_simdcall | gmx::load4 (const double *m) |
Load 4 double values from aligned memory into SIMD4 variable. More... | |
static void gmx_simdcall | gmx::store4 (double *m, Simd4Double a) |
Store the contents of SIMD4 double to aligned memory m. More... | |
static Simd4Double gmx_simdcall | gmx::load4U (const double *m) |
Load SIMD4 double from unaligned memory. More... | |
static void gmx_simdcall | gmx::store4U (double *m, Simd4Double a) |
Store SIMD4 double to unaligned memory. More... | |
static Simd4Double gmx_simdcall | gmx::simd4SetZeroD () |
Set all SIMD4 double elements to 0. More... | |
static Simd4Double gmx_simdcall | gmx::operator& (Simd4Double a, Simd4Double b) |
Bitwise and for two SIMD4 double variables. More... | |
static Simd4Double gmx_simdcall | gmx::andNot (Simd4Double a, Simd4Double b) |
Bitwise andnot for two SIMD4 double variables. c=(~a) & b. More... | |
static Simd4Double gmx_simdcall | gmx::operator| (Simd4Double a, Simd4Double b) |
Bitwise or for two SIMD4 doubles. More... | |
static Simd4Double gmx_simdcall | gmx::operator^ (Simd4Double a, Simd4Double b) |
Bitwise xor for two SIMD4 double variables. More... | |
static Simd4Double gmx_simdcall | gmx::operator+ (Simd4Double a, Simd4Double b) |
Add two double SIMD4 variables. More... | |
static Simd4Double gmx_simdcall | gmx::operator- (Simd4Double a, Simd4Double b) |
Subtract two SIMD4 variables. More... | |
static Simd4Double gmx_simdcall | gmx::operator- (Simd4Double a) |
SIMD4 floating-point negate. More... | |
static Simd4Double gmx_simdcall | gmx::operator* (Simd4Double a, Simd4Double b) |
Multiply two SIMD4 variables. More... | |
static Simd4Double gmx_simdcall | gmx::fma (Simd4Double a, Simd4Double b, Simd4Double c) |
SIMD4 Fused-multiply-add. Result is a*b+c. More... | |
static Simd4Double gmx_simdcall | gmx::fms (Simd4Double a, Simd4Double b, Simd4Double c) |
SIMD4 Fused-multiply-subtract. Result is a*b-c. More... | |
static Simd4Double gmx_simdcall | gmx::fnma (Simd4Double a, Simd4Double b, Simd4Double c) |
SIMD4 Fused-negated-multiply-add. Result is -a*b+c. More... | |
static Simd4Double gmx_simdcall | gmx::fnms (Simd4Double a, Simd4Double b, Simd4Double c) |
SIMD4 Fused-negated-multiply-subtract. Result is -a*b-c. More... | |
static Simd4Double gmx_simdcall | gmx::rsqrt (Simd4Double x) |
SIMD4 1.0/sqrt(x) lookup. More... | |
static Simd4Double gmx_simdcall | gmx::abs (Simd4Double a) |
SIMD4 Floating-point abs(). More... | |
static Simd4Double gmx_simdcall | gmx::max (Simd4Double a, Simd4Double b) |
Set each SIMD4 element to the largest from two variables. More... | |
static Simd4Double gmx_simdcall | gmx::min (Simd4Double a, Simd4Double b) |
Set each SIMD4 element to the largest from two variables. More... | |
static Simd4Double gmx_simdcall | gmx::round (Simd4Double a) |
SIMD4 Round to nearest integer value (in floating-point format). More... | |
static Simd4Double gmx_simdcall | gmx::trunc (Simd4Double a) |
Truncate SIMD4, i.e. round towards zero - common hardware instruction. More... | |
static double gmx_simdcall | gmx::dotProduct (Simd4Double a, Simd4Double b) |
Return dot product of two double precision SIMD4 variables. More... | |
static void gmx_simdcall | gmx::transpose (Simd4Double *v0, Simd4Double *v1, Simd4Double *v2, Simd4Double *v3) |
SIMD4 double transpose. More... | |
static Simd4DBool gmx_simdcall | gmx::operator== (Simd4Double a, Simd4Double b) |
a==b for SIMD4 double More... | |
static Simd4DBool gmx_simdcall | gmx::operator!= (Simd4Double a, Simd4Double b) |
a!=b for SIMD4 double More... | |
static Simd4DBool gmx_simdcall | gmx::operator< (Simd4Double a, Simd4Double b) |
a<b for SIMD4 double More... | |
static Simd4DBool gmx_simdcall | gmx::operator<= (Simd4Double a, Simd4Double b) |
a<=b for SIMD4 double. More... | |
static Simd4DBool gmx_simdcall | gmx::operator&& (Simd4DBool a, Simd4DBool b) |
Logical and on single precision SIMD4 booleans. More... | |
static Simd4DBool gmx_simdcall | gmx::operator|| (Simd4DBool a, Simd4DBool b) |
Logical or on single precision SIMD4 booleans. More... | |
static bool gmx_simdcall | gmx::anyTrue (Simd4DBool a) |
Returns non-zero if any of the boolean in SIMD4 a is True, otherwise 0. More... | |
static Simd4Double gmx_simdcall | gmx::selectByMask (Simd4Double a, Simd4DBool mask) |
Select from single precision SIMD4 variable where boolean is true. More... | |
static Simd4Double gmx_simdcall | gmx::selectByNotMask (Simd4Double a, Simd4DBool mask) |
Select from single precision SIMD4 variable where boolean is false. More... | |
static Simd4Double gmx_simdcall | gmx::blend (Simd4Double a, Simd4Double b, Simd4DBool sel) |
Vector-blend SIMD4 selection. More... | |
static double gmx_simdcall | gmx::reduce (Simd4Double a) |
Return sum of all elements in SIMD4 double variable. More... | |
Constant width-4 single precision SIMD types and instructions | |
static Simd4Float gmx_simdcall | gmx::load4 (const float *m) |
Load 4 float values from aligned memory into SIMD4 variable. More... | |
static void gmx_simdcall | gmx::store4 (float *m, Simd4Float a) |
Store the contents of SIMD4 float to aligned memory m. More... | |
static Simd4Float gmx_simdcall | gmx::load4U (const float *m) |
Load SIMD4 float from unaligned memory. More... | |
static void gmx_simdcall | gmx::store4U (float *m, Simd4Float a) |
Store SIMD4 float to unaligned memory. More... | |
static Simd4Float gmx_simdcall | gmx::simd4SetZeroF () |
Set all SIMD4 float elements to 0. More... | |
static Simd4Float gmx_simdcall | gmx::operator& (Simd4Float a, Simd4Float b) |
Bitwise and for two SIMD4 float variables. More... | |
static Simd4Float gmx_simdcall | gmx::andNot (Simd4Float a, Simd4Float b) |
Bitwise andnot for two SIMD4 float variables. c=(~a) & b. More... | |
static Simd4Float gmx_simdcall | gmx::operator| (Simd4Float a, Simd4Float b) |
Bitwise or for two SIMD4 floats. More... | |
static Simd4Float gmx_simdcall | gmx::operator^ (Simd4Float a, Simd4Float b) |
Bitwise xor for two SIMD4 float variables. More... | |
static Simd4Float gmx_simdcall | gmx::operator+ (Simd4Float a, Simd4Float b) |
Add two float SIMD4 variables. More... | |
static Simd4Float gmx_simdcall | gmx::operator- (Simd4Float a, Simd4Float b) |
Subtract two SIMD4 variables. More... | |
static Simd4Float gmx_simdcall | gmx::operator- (Simd4Float a) |
SIMD4 floating-point negate. More... | |
static Simd4Float gmx_simdcall | gmx::operator* (Simd4Float a, Simd4Float b) |
Multiply two SIMD4 variables. More... | |
static Simd4Float gmx_simdcall | gmx::fma (Simd4Float a, Simd4Float b, Simd4Float c) |
SIMD4 Fused-multiply-add. Result is a*b+c. More... | |
static Simd4Float gmx_simdcall | gmx::fms (Simd4Float a, Simd4Float b, Simd4Float c) |
SIMD4 Fused-multiply-subtract. Result is a*b-c. More... | |
static Simd4Float gmx_simdcall | gmx::fnma (Simd4Float a, Simd4Float b, Simd4Float c) |
SIMD4 Fused-negated-multiply-add. Result is -a*b+c. More... | |
static Simd4Float gmx_simdcall | gmx::fnms (Simd4Float a, Simd4Float b, Simd4Float c) |
SIMD4 Fused-negated-multiply-subtract. Result is -a*b-c. More... | |
static Simd4Float gmx_simdcall | gmx::rsqrt (Simd4Float x) |
SIMD4 1.0/sqrt(x) lookup. More... | |
static Simd4Float gmx_simdcall | gmx::abs (Simd4Float a) |
SIMD4 Floating-point fabs(). More... | |
static Simd4Float gmx_simdcall | gmx::max (Simd4Float a, Simd4Float b) |
Set each SIMD4 element to the largest from two variables. More... | |
static Simd4Float gmx_simdcall | gmx::min (Simd4Float a, Simd4Float b) |
Set each SIMD4 element to the largest from two variables. More... | |
static Simd4Float gmx_simdcall | gmx::round (Simd4Float a) |
SIMD4 Round to nearest integer value (in floating-point format). More... | |
static Simd4Float gmx_simdcall | gmx::trunc (Simd4Float a) |
Truncate SIMD4, i.e. round towards zero - common hardware instruction. More... | |
static float gmx_simdcall | gmx::dotProduct (Simd4Float a, Simd4Float b) |
Return dot product of two single precision SIMD4 variables. More... | |
static void gmx_simdcall | gmx::transpose (Simd4Float *v0, Simd4Float *v1, Simd4Float *v2, Simd4Float *v3) |
SIMD4 float transpose. More... | |
static Simd4FBool gmx_simdcall | gmx::operator== (Simd4Float a, Simd4Float b) |
a==b for SIMD4 float More... | |
static Simd4FBool gmx_simdcall | gmx::operator!= (Simd4Float a, Simd4Float b) |
a!=b for SIMD4 float More... | |
static Simd4FBool gmx_simdcall | gmx::operator< (Simd4Float a, Simd4Float b) |
a<b for SIMD4 float More... | |
static Simd4FBool gmx_simdcall | gmx::operator<= (Simd4Float a, Simd4Float b) |
a<=b for SIMD4 float. More... | |
static Simd4FBool gmx_simdcall | gmx::operator&& (Simd4FBool a, Simd4FBool b) |
Logical and on single precision SIMD4 booleans. More... | |
static Simd4FBool gmx_simdcall | gmx::operator|| (Simd4FBool a, Simd4FBool b) |
Logical or on single precision SIMD4 booleans. More... | |
static bool gmx_simdcall | gmx::anyTrue (Simd4FBool a) |
Returns non-zero if any of the boolean in SIMD4 a is True, otherwise 0. More... | |
static Simd4Float gmx_simdcall | gmx::selectByMask (Simd4Float a, Simd4FBool mask) |
Select from single precision SIMD4 variable where boolean is true. More... | |
static Simd4Float gmx_simdcall | gmx::selectByNotMask (Simd4Float a, Simd4FBool mask) |
Select from single precision SIMD4 variable where boolean is false. More... | |
static Simd4Float gmx_simdcall | gmx::blend (Simd4Float a, Simd4Float b, Simd4FBool sel) |
Vector-blend SIMD4 selection. More... | |
static float gmx_simdcall | gmx::reduce (Simd4Float a) |
Return sum of all elements in SIMD4 float variable. More... | |
SIMD implementation load/store operations for double precision floating point | |
static SimdDouble gmx_simdcall | gmx::simdLoad (const double *m, SimdDoubleTag={}) |
Load GMX_SIMD_DOUBLE_WIDTH numbers from aligned memory. More... | |
static void gmx_simdcall | gmx::store (double *m, SimdDouble a) |
Store the contents of SIMD double variable to aligned memory m. More... | |
static SimdDouble gmx_simdcall | gmx::simdLoadU (const double *m, SimdDoubleTag={}) |
Load SIMD double from unaligned memory. More... | |
static void gmx_simdcall | gmx::storeU (double *m, SimdDouble a) |
Store SIMD double to unaligned memory. More... | |
static SimdDouble gmx_simdcall | gmx::setZeroD () |
Set all SIMD double variable elements to 0.0. More... | |
SIMD implementation load/store operations for integers (corresponding to double) | |
static SimdDInt32 gmx_simdcall | gmx::simdLoad (const std::int32_t *m, SimdDInt32Tag) |
Load aligned SIMD integer data, width corresponds to gmx::SimdDouble. More... | |
static void gmx_simdcall | gmx::store (std::int32_t *m, SimdDInt32 a) |
Store aligned SIMD integer data, width corresponds to gmx::SimdDouble. More... | |
static SimdDInt32 gmx_simdcall | gmx::simdLoadU (const std::int32_t *m, SimdDInt32Tag) |
Load unaligned integer SIMD data, width corresponds to gmx::SimdDouble. More... | |
static void gmx_simdcall | gmx::storeU (std::int32_t *m, SimdDInt32 a) |
Store unaligned SIMD integer data, width corresponds to gmx::SimdDouble. More... | |
static SimdDInt32 gmx_simdcall | gmx::setZeroDI () |
Set all SIMD (double) integer variable elements to 0. More... | |
template<int index> | |
static std::int32_t gmx_simdcall | gmx::extract (SimdDInt32 a) |
Extract element with index i from gmx::SimdDInt32. More... | |
SIMD implementation double precision floating-point bitwise logical operations | |
static SimdDouble gmx_simdcall | gmx::operator& (SimdDouble a, SimdDouble b) |
Bitwise and for two SIMD double variables. More... | |
static SimdDouble gmx_simdcall | gmx::andNot (SimdDouble a, SimdDouble b) |
Bitwise andnot for SIMD double. More... | |
static SimdDouble gmx_simdcall | gmx::operator| (SimdDouble a, SimdDouble b) |
Bitwise or for SIMD double. More... | |
static SimdDouble gmx_simdcall | gmx::operator^ (SimdDouble a, SimdDouble b) |
Bitwise xor for SIMD double. More... | |
SIMD implementation double precision floating-point arithmetics | |
static SimdDouble gmx_simdcall | gmx::operator+ (SimdDouble a, SimdDouble b) |
Add two double SIMD variables. More... | |
static SimdDouble gmx_simdcall | gmx::operator- (SimdDouble a, SimdDouble b) |
Subtract two double SIMD variables. More... | |
static SimdDouble gmx_simdcall | gmx::operator- (SimdDouble a) |
SIMD double precision negate. More... | |
static SimdDouble gmx_simdcall | gmx::operator* (SimdDouble a, SimdDouble b) |
Multiply two double SIMD variables. More... | |
static SimdDouble gmx_simdcall | gmx::fma (SimdDouble a, SimdDouble b, SimdDouble c) |
SIMD double Fused-multiply-add. Result is a*b+c. More... | |
static SimdDouble gmx_simdcall | gmx::fms (SimdDouble a, SimdDouble b, SimdDouble c) |
SIMD double Fused-multiply-subtract. Result is a*b-c. More... | |
static SimdDouble gmx_simdcall | gmx::fnma (SimdDouble a, SimdDouble b, SimdDouble c) |
SIMD double Fused-negated-multiply-add. Result is -a*b+c. More... | |
static SimdDouble gmx_simdcall | gmx::fnms (SimdDouble a, SimdDouble b, SimdDouble c) |
SIMD double Fused-negated-multiply-subtract. Result is -a*b-c. More... | |
static SimdDouble gmx_simdcall | gmx::rsqrt (SimdDouble x) |
double SIMD 1.0/sqrt(x) lookup. More... | |
static SimdDouble gmx_simdcall | gmx::rcp (SimdDouble x) |
SIMD double 1.0/x lookup. More... | |
static SimdDouble gmx_simdcall | gmx::maskAdd (SimdDouble a, SimdDouble b, SimdDBool m) |
Add two double SIMD variables, masked version. More... | |
static SimdDouble gmx_simdcall | gmx::maskzMul (SimdDouble a, SimdDouble b, SimdDBool m) |
Multiply two double SIMD variables, masked version. More... | |
static SimdDouble gmx_simdcall | gmx::maskzFma (SimdDouble a, SimdDouble b, SimdDouble c, SimdDBool m) |
SIMD double fused multiply-add, masked version. More... | |
static SimdDouble gmx_simdcall | gmx::maskzRsqrt (SimdDouble x, SimdDBool m) |
SIMD double 1.0/sqrt(x) lookup, masked version. More... | |
static SimdDouble gmx_simdcall | gmx::maskzRcp (SimdDouble x, SimdDBool m) |
SIMD double 1.0/x lookup, masked version. More... | |
static SimdDouble gmx_simdcall | gmx::abs (SimdDouble a) |
SIMD double floating-point fabs(). More... | |
static SimdDouble gmx_simdcall | gmx::max (SimdDouble a, SimdDouble b) |
Set each SIMD double element to the largest from two variables. More... | |
static SimdDouble gmx_simdcall | gmx::min (SimdDouble a, SimdDouble b) |
Set each SIMD double element to the smallest from two variables. More... | |
static SimdDouble gmx_simdcall | gmx::round (SimdDouble a) |
SIMD double round to nearest integer value (in floating-point format). More... | |
static SimdDouble gmx_simdcall | gmx::trunc (SimdDouble a) |
Truncate SIMD double, i.e. round towards zero - common hardware instruction. More... | |
template<MathOptimization opt = MathOptimization::Safe> | |
static SimdDouble gmx_simdcall | gmx::frexp (SimdDouble value, SimdDInt32 *exponent) |
Extract (integer) exponent and fraction from double precision SIMD. More... | |
template<MathOptimization opt = MathOptimization::Safe> | |
static SimdDouble gmx_simdcall | gmx::ldexp (SimdDouble value, SimdDInt32 exponent) |
Multiply a SIMD double value by the number 2 raised to an exp power. More... | |
static double gmx_simdcall | gmx::reduce (SimdDouble a) |
Return sum of all elements in SIMD double variable. More... | |
SIMD implementation double precision floating-point comparison, boolean, selection. | |
static SimdDBool gmx_simdcall | gmx::operator== (SimdDouble a, SimdDouble b) |
SIMD a==b for double SIMD. More... | |
static SimdDBool gmx_simdcall | gmx::operator!= (SimdDouble a, SimdDouble b) |
SIMD a!=b for double SIMD. More... | |
static SimdDBool gmx_simdcall | gmx::operator< (SimdDouble a, SimdDouble b) |
SIMD a<b for double SIMD. More... | |
static SimdDBool gmx_simdcall | gmx::operator<= (SimdDouble a, SimdDouble b) |
SIMD a<=b for double SIMD. More... | |
static SimdDBool gmx_simdcall | gmx::testBits (SimdDouble a) |
Return true if any bits are set in the single precision SIMD. More... | |
static SimdDBool gmx_simdcall | gmx::operator&& (SimdDBool a, SimdDBool b) |
Logical and on double precision SIMD booleans. More... | |
static SimdDBool gmx_simdcall | gmx::operator|| (SimdDBool a, SimdDBool b) |
Logical or on double precision SIMD booleans. More... | |
static bool gmx_simdcall | gmx::anyTrue (SimdDBool a) |
Returns non-zero if any of the boolean in SIMD a is True, otherwise 0. More... | |
static SimdDouble gmx_simdcall | gmx::selectByMask (SimdDouble a, SimdDBool mask) |
Select from double precision SIMD variable where boolean is true. More... | |
static SimdDouble gmx_simdcall | gmx::selectByNotMask (SimdDouble a, SimdDBool mask) |
Select from double precision SIMD variable where boolean is false. More... | |
static SimdDouble gmx_simdcall | gmx::blend (SimdDouble a, SimdDouble b, SimdDBool sel) |
Vector-blend SIMD double selection. More... | |
SIMD implementation integer (corresponding to double) bitwise logical operations | |
static SimdDInt32 gmx_simdcall | gmx::operator& (SimdDInt32 a, SimdDInt32 b) |
Integer SIMD bitwise and. More... | |
static SimdDInt32 gmx_simdcall | gmx::andNot (SimdDInt32 a, SimdDInt32 b) |
Integer SIMD bitwise not/complement. More... | |
static SimdDInt32 gmx_simdcall | gmx::operator| (SimdDInt32 a, SimdDInt32 b) |
Integer SIMD bitwise or. More... | |
static SimdDInt32 gmx_simdcall | gmx::operator^ (SimdDInt32 a, SimdDInt32 b) |
Integer SIMD bitwise xor. More... | |
SIMD implementation integer (corresponding to double) arithmetics | |
static SimdDInt32 gmx_simdcall | gmx::operator+ (SimdDInt32 a, SimdDInt32 b) |
Add SIMD integers. More... | |
static SimdDInt32 gmx_simdcall | gmx::operator- (SimdDInt32 a, SimdDInt32 b) |
Subtract SIMD integers. More... | |
static SimdDInt32 gmx_simdcall | gmx::operator* (SimdDInt32 a, SimdDInt32 b) |
Multiply SIMD integers. More... | |
SIMD implementation integer (corresponding to double) comparisons, boolean selection | |
static SimdDIBool gmx_simdcall | gmx::operator== (SimdDInt32 a, SimdDInt32 b) |
Equality comparison of two integers corresponding to double values. More... | |
static SimdDIBool gmx_simdcall | gmx::operator< (SimdDInt32 a, SimdDInt32 b) |
Less-than comparison of two SIMD integers corresponding to double values. More... | |
static SimdDIBool gmx_simdcall | gmx::testBits (SimdDInt32 a) |
Check if any bit is set in each element. More... | |
static SimdDIBool gmx_simdcall | gmx::operator&& (SimdDIBool a, SimdDIBool b) |
Logical AND on SimdDIBool. More... | |
static SimdDIBool gmx_simdcall | gmx::operator|| (SimdDIBool a, SimdDIBool b) |
Logical OR on SimdDIBool. More... | |
static bool gmx_simdcall | gmx::anyTrue (SimdDIBool a) |
Returns true if any of the boolean in x is True, otherwise 0. More... | |
static SimdDInt32 gmx_simdcall | gmx::selectByMask (SimdDInt32 a, SimdDIBool mask) |
Select from gmx::SimdDInt32 variable where boolean is true. More... | |
static SimdDInt32 gmx_simdcall | gmx::selectByNotMask (SimdDInt32 a, SimdDIBool mask) |
Select from gmx::SimdDInt32 variable where boolean is false. More... | |
static SimdDInt32 gmx_simdcall | gmx::blend (SimdDInt32 a, SimdDInt32 b, SimdDIBool sel) |
Vector-blend SIMD integer selection. More... | |
SIMD implementation conversion operations | |
static SimdDInt32 gmx_simdcall | gmx::cvtR2I (SimdDouble a) |
Round double precision floating point to integer. More... | |
static SimdDInt32 gmx_simdcall | gmx::cvttR2I (SimdDouble a) |
Truncate double precision floating point to integer. More... | |
static SimdDouble gmx_simdcall | gmx::cvtI2R (SimdDInt32 a) |
Convert integer to double precision floating point. More... | |
static SimdDIBool gmx_simdcall | gmx::cvtB2IB (SimdDBool a) |
Convert from double precision boolean to corresponding integer boolean. More... | |
static SimdDBool gmx_simdcall | gmx::cvtIB2B (SimdDIBool a) |
Convert from integer boolean to corresponding double precision boolean. More... | |
static SimdDouble gmx_simdcall | gmx::cvtF2D (SimdFloat gmx_unused f) |
Convert SIMD float to double. More... | |
static SimdFloat gmx_simdcall | gmx::cvtD2F (SimdDouble gmx_unused d) |
Convert SIMD double to float. More... | |
static void gmx_simdcall | gmx::cvtF2DD (SimdFloat gmx_unused f, SimdDouble gmx_unused *d0, SimdDouble gmx_unused *d1) |
Convert SIMD float to double. More... | |
static SimdFloat gmx_simdcall | gmx::cvtDD2F (SimdDouble gmx_unused d0, SimdDouble gmx_unused d1) |
Convert SIMD double to float. More... | |
static SimdFInt32 gmx_simdcall | gmx::cvtR2I (SimdFloat a) |
Round single precision floating point to integer. More... | |
static SimdFInt32 gmx_simdcall | gmx::cvttR2I (SimdFloat a) |
Truncate single precision floating point to integer. More... | |
static SimdFloat gmx_simdcall | gmx::cvtI2R (SimdFInt32 a) |
Convert integer to single precision floating point. More... | |
static SimdFIBool gmx_simdcall | gmx::cvtB2IB (SimdFBool a) |
Convert from single precision boolean to corresponding integer boolean. More... | |
static SimdFBool gmx_simdcall | gmx::cvtIB2B (SimdFIBool a) |
Convert from integer boolean to corresponding single precision boolean. More... | |
SIMD implementation load/store operations for single precision floating point | |
static SimdFloat gmx_simdcall | gmx::simdLoad (const float *m, SimdFloatTag={}) |
Load GMX_SIMD_FLOAT_WIDTH float numbers from aligned memory. More... | |
static void gmx_simdcall | gmx::store (float *m, SimdFloat a) |
Store the contents of SIMD float variable to aligned memory m. More... | |
static SimdFloat gmx_simdcall | gmx::simdLoadU (const float *m, SimdFloatTag={}) |
Load SIMD float from unaligned memory. More... | |
static void gmx_simdcall | gmx::storeU (float *m, SimdFloat a) |
Store SIMD float to unaligned memory. More... | |
static SimdFloat gmx_simdcall | gmx::setZeroF () |
Set all SIMD float variable elements to 0.0. More... | |
SIMD implementation load/store operations for integers (corresponding to float) | |
static SimdFInt32 gmx_simdcall | gmx::simdLoad (const std::int32_t *m, SimdFInt32Tag) |
Load aligned SIMD integer data, width corresponds to gmx::SimdFloat. More... | |
static void gmx_simdcall | gmx::store (std::int32_t *m, SimdFInt32 a) |
Store aligned SIMD integer data, width corresponds to gmx::SimdFloat. More... | |
static SimdFInt32 gmx_simdcall | gmx::simdLoadU (const std::int32_t *m, SimdFInt32Tag) |
Load unaligned integer SIMD data, width corresponds to gmx::SimdFloat. More... | |
static void gmx_simdcall | gmx::storeU (std::int32_t *m, SimdFInt32 a) |
Store unaligned SIMD integer data, width corresponds to gmx::SimdFloat. More... | |
static SimdFInt32 gmx_simdcall | gmx::setZeroFI () |
Set all SIMD (float) integer variable elements to 0. More... | |
template<int index> | |
static std::int32_t gmx_simdcall | gmx::extract (SimdFInt32 a) |
Extract element with index i from gmx::SimdFInt32. More... | |
SIMD implementation single precision floating-point bitwise logical operations | |
static SimdFloat gmx_simdcall | gmx::operator& (SimdFloat a, SimdFloat b) |
Bitwise and for two SIMD float variables. More... | |
static SimdFloat gmx_simdcall | gmx::andNot (SimdFloat a, SimdFloat b) |
Bitwise andnot for SIMD float. More... | |
static SimdFloat gmx_simdcall | gmx::operator| (SimdFloat a, SimdFloat b) |
Bitwise or for SIMD float. More... | |
static SimdFloat gmx_simdcall | gmx::operator^ (SimdFloat a, SimdFloat b) |
Bitwise xor for SIMD float. More... | |
SIMD implementation single precision floating-point arithmetics | |
static SimdFloat gmx_simdcall | gmx::operator+ (SimdFloat a, SimdFloat b) |
Add two float SIMD variables. More... | |
static SimdFloat gmx_simdcall | gmx::operator- (SimdFloat a, SimdFloat b) |
Subtract two float SIMD variables. More... | |
static SimdFloat gmx_simdcall | gmx::operator- (SimdFloat a) |
SIMD single precision negate. More... | |
static SimdFloat gmx_simdcall | gmx::operator* (SimdFloat a, SimdFloat b) |
Multiply two float SIMD variables. More... | |
static SimdFloat gmx_simdcall | gmx::fma (SimdFloat a, SimdFloat b, SimdFloat c) |
SIMD float Fused-multiply-add. Result is a*b+c. More... | |
static SimdFloat gmx_simdcall | gmx::fms (SimdFloat a, SimdFloat b, SimdFloat c) |
SIMD float Fused-multiply-subtract. Result is a*b-c. More... | |
static SimdFloat gmx_simdcall | gmx::fnma (SimdFloat a, SimdFloat b, SimdFloat c) |
SIMD float Fused-negated-multiply-add. Result is -a*b+c. More... | |
static SimdFloat gmx_simdcall | gmx::fnms (SimdFloat a, SimdFloat b, SimdFloat c) |
SIMD float Fused-negated-multiply-subtract. Result is -a*b-c. More... | |
static SimdFloat gmx_simdcall | gmx::rsqrt (SimdFloat x) |
SIMD float 1.0/sqrt(x) lookup. More... | |
static SimdFloat gmx_simdcall | gmx::rcp (SimdFloat x) |
SIMD float 1.0/x lookup. More... | |
static SimdFloat gmx_simdcall | gmx::maskAdd (SimdFloat a, SimdFloat b, SimdFBool m) |
Add two float SIMD variables, masked version. More... | |
static SimdFloat gmx_simdcall | gmx::maskzMul (SimdFloat a, SimdFloat b, SimdFBool m) |
Multiply two float SIMD variables, masked version. More... | |
static SimdFloat gmx_simdcall | gmx::maskzFma (SimdFloat a, SimdFloat b, SimdFloat c, SimdFBool m) |
SIMD float fused multiply-add, masked version. More... | |
static SimdFloat gmx_simdcall | gmx::maskzRsqrt (SimdFloat x, SimdFBool m) |
SIMD float 1.0/sqrt(x) lookup, masked version. More... | |
static SimdFloat gmx_simdcall | gmx::maskzRcp (SimdFloat x, SimdFBool m) |
SIMD float 1.0/x lookup, masked version. More... | |
static SimdFloat gmx_simdcall | gmx::abs (SimdFloat a) |
SIMD float Floating-point abs(). More... | |
static SimdFloat gmx_simdcall | gmx::max (SimdFloat a, SimdFloat b) |
Set each SIMD float element to the largest from two variables. More... | |
static SimdFloat gmx_simdcall | gmx::min (SimdFloat a, SimdFloat b) |
Set each SIMD float element to the smallest from two variables. More... | |
static SimdFloat gmx_simdcall | gmx::round (SimdFloat a) |
SIMD float round to nearest integer value (in floating-point format). More... | |
static SimdFloat gmx_simdcall | gmx::trunc (SimdFloat a) |
Truncate SIMD float, i.e. round towards zero - common hardware instruction. More... | |
template<MathOptimization opt = MathOptimization::Safe> | |
static SimdFloat gmx_simdcall | gmx::frexp (SimdFloat value, SimdFInt32 *exponent) |
Extract (integer) exponent and fraction from single precision SIMD. More... | |
template<MathOptimization opt = MathOptimization::Safe> | |
static SimdFloat gmx_simdcall | gmx::ldexp (SimdFloat value, SimdFInt32 exponent) |
Multiply a SIMD float value by the number 2 raised to an exp power. More... | |
static float gmx_simdcall | gmx::reduce (SimdFloat a) |
Return sum of all elements in SIMD float variable. More... | |
SIMD implementation single precision floating-point comparisons, boolean, selection. | |
static SimdFBool gmx_simdcall | gmx::operator== (SimdFloat a, SimdFloat b) |
SIMD a==b for single SIMD. More... | |
static SimdFBool gmx_simdcall | gmx::operator!= (SimdFloat a, SimdFloat b) |
SIMD a!=b for single SIMD. More... | |
static SimdFBool gmx_simdcall | gmx::operator< (SimdFloat a, SimdFloat b) |
SIMD a<b for single SIMD. More... | |
static SimdFBool gmx_simdcall | gmx::operator<= (SimdFloat a, SimdFloat b) |
SIMD a<=b for single SIMD. More... | |
static SimdFBool gmx_simdcall | gmx::testBits (SimdFloat a) |
Return true if any bits are set in the single precision SIMD. More... | |
static SimdFBool gmx_simdcall | gmx::operator&& (SimdFBool a, SimdFBool b) |
Logical and on single precision SIMD booleans. More... | |
static SimdFBool gmx_simdcall | gmx::operator|| (SimdFBool a, SimdFBool b) |
Logical or on single precision SIMD booleans. More... | |
static bool gmx_simdcall | gmx::anyTrue (SimdFBool a) |
Returns non-zero if any of the boolean in SIMD a is True, otherwise 0. More... | |
static SimdFloat gmx_simdcall | gmx::selectByMask (SimdFloat a, SimdFBool mask) |
Select from single precision SIMD variable where boolean is true. More... | |
static SimdFloat gmx_simdcall | gmx::selectByNotMask (SimdFloat a, SimdFBool mask) |
Select from single precision SIMD variable where boolean is false. More... | |
static SimdFloat gmx_simdcall | gmx::blend (SimdFloat a, SimdFloat b, SimdFBool sel) |
Vector-blend SIMD float selection. More... | |
SIMD implementation integer (corresponding to float) bitwise logical operations | |
static SimdFInt32 gmx_simdcall | gmx::operator& (SimdFInt32 a, SimdFInt32 b) |
Integer SIMD bitwise and. More... | |
static SimdFInt32 gmx_simdcall | gmx::andNot (SimdFInt32 a, SimdFInt32 b) |
Integer SIMD bitwise not/complement. More... | |
static SimdFInt32 gmx_simdcall | gmx::operator| (SimdFInt32 a, SimdFInt32 b) |
Integer SIMD bitwise or. More... | |
static SimdFInt32 gmx_simdcall | gmx::operator^ (SimdFInt32 a, SimdFInt32 b) |
Integer SIMD bitwise xor. More... | |
SIMD implementation integer (corresponding to float) arithmetics | |
static SimdFInt32 gmx_simdcall | gmx::operator+ (SimdFInt32 a, SimdFInt32 b) |
Add SIMD integers. More... | |
static SimdFInt32 gmx_simdcall | gmx::operator- (SimdFInt32 a, SimdFInt32 b) |
Subtract SIMD integers. More... | |
static SimdFInt32 gmx_simdcall | gmx::operator* (SimdFInt32 a, SimdFInt32 b) |
Multiply SIMD integers. More... | |
SIMD implementation integer (corresponding to float) comparisons, boolean, selection | |
static SimdFIBool gmx_simdcall | gmx::operator== (SimdFInt32 a, SimdFInt32 b) |
Equality comparison of two integers corresponding to float values. More... | |
static SimdFIBool gmx_simdcall | gmx::operator< (SimdFInt32 a, SimdFInt32 b) |
Less-than comparison of two SIMD integers corresponding to float values. More... | |
static SimdFIBool gmx_simdcall | gmx::testBits (SimdFInt32 a) |
Check if any bit is set in each element. More... | |
static SimdFIBool gmx_simdcall | gmx::operator&& (SimdFIBool a, SimdFIBool b) |
Logical AND on SimdFIBool. More... | |
static SimdFIBool gmx_simdcall | gmx::operator|| (SimdFIBool a, SimdFIBool b) |
Logical OR on SimdFIBool. More... | |
static bool gmx_simdcall | gmx::anyTrue (SimdFIBool a) |
Returns true if any of the boolean in x is True, otherwise 0. More... | |
static SimdFInt32 gmx_simdcall | gmx::selectByMask (SimdFInt32 a, SimdFIBool mask) |
Select from gmx::SimdFInt32 variable where boolean is true. More... | |
static SimdFInt32 gmx_simdcall | gmx::selectByNotMask (SimdFInt32 a, SimdFIBool mask) |
Select from gmx::SimdFInt32 variable where boolean is false. More... | |
static SimdFInt32 gmx_simdcall | gmx::blend (SimdFInt32 a, SimdFInt32 b, SimdFIBool sel) |
Vector-blend SIMD integer selection. More... | |
Higher-level SIMD utility functions, double precision. | |
These include generic functions to work with triplets of data, typically coordinates, and a few utility functions to load and update data in the nonbonded kernels. These functions should be available on all implementations. | |
static const int | gmx::c_simdBestPairAlignmentDouble = 2 |
Best alignment to use for aligned pairs of double data. More... | |
template<int align> | |
static void gmx_simdcall | gmx::gatherLoadTranspose (const double *base, const std::int32_t offset[], SimdDouble *v0, SimdDouble *v1, SimdDouble *v2, SimdDouble *v3) |
Load 4 consecutive double from each of GMX_SIMD_DOUBLE_WIDTH offsets, and transpose into 4 SIMD double variables. More... | |
template<int align> | |
static void gmx_simdcall | gmx::gatherLoadTranspose (const double *base, const std::int32_t offset[], SimdDouble *v0, SimdDouble *v1) |
Load 2 consecutive double from each of GMX_SIMD_DOUBLE_WIDTH offsets, and transpose into 2 SIMD double variables. More... | |
template<int align> | |
static void gmx_simdcall | gmx::gatherLoadUTranspose (const double *base, const std::int32_t offset[], SimdDouble *v0, SimdDouble *v1, SimdDouble *v2) |
Load 3 consecutive doubles from each of GMX_SIMD_DOUBLE_WIDTH offsets, and transpose into 3 SIMD double variables. More... | |
template<int align> | |
static void gmx_simdcall | gmx::transposeScatterStoreU (double *base, const std::int32_t offset[], SimdDouble v0, SimdDouble v1, SimdDouble v2) |
Transpose and store 3 SIMD doubles to 3 consecutive addresses at GMX_SIMD_DOUBLE_WIDTH offsets. More... | |
template<int align> | |
static void gmx_simdcall | gmx::transposeScatterIncrU (double *base, const std::int32_t offset[], SimdDouble v0, SimdDouble v1, SimdDouble v2) |
Transpose and add 3 SIMD doubles to 3 consecutive addresses at GMX_SIMD_DOUBLE_WIDTH offsets. More... | |
template<int align> | |
static void gmx_simdcall | gmx::transposeScatterDecrU (double *base, const std::int32_t offset[], SimdDouble v0, SimdDouble v1, SimdDouble v2) |
Transpose and subtract 3 SIMD doubles to 3 consecutive addresses at GMX_SIMD_DOUBLE_WIDTH offsets. More... | |
static void gmx_simdcall | gmx::expandScalarsToTriplets (SimdDouble scalar, SimdDouble *triplets0, SimdDouble *triplets1, SimdDouble *triplets2) |
Expand each element of double SIMD variable into three identical consecutive elements in three SIMD outputs. More... | |
template<int align> | |
static void gmx_simdcall | gmx::gatherLoadBySimdIntTranspose (const double *base, SimdDInt32 offset, SimdDouble *v0, SimdDouble *v1, SimdDouble *v2, SimdDouble *v3) |
Load 4 consecutive doubles from each of GMX_SIMD_DOUBLE_WIDTH offsets specified by a SIMD integer, transpose into 4 SIMD double variables. More... | |
template<int align> | |
static void gmx_simdcall | gmx::gatherLoadUBySimdIntTranspose (const double *base, SimdDInt32 offset, SimdDouble *v0, SimdDouble *v1) |
Load 2 consecutive doubles from each of GMX_SIMD_DOUBLE_WIDTH offsets (unaligned) specified by SIMD integer, transpose into 2 SIMD doubles. More... | |
template<int align> | |
static void gmx_simdcall | gmx::gatherLoadBySimdIntTranspose (const double *base, SimdDInt32 offset, SimdDouble *v0, SimdDouble *v1) |
Load 2 consecutive doubles from each of GMX_SIMD_DOUBLE_WIDTH offsets specified by a SIMD integer, transpose into 2 SIMD double variables. More... | |
static double gmx_simdcall | gmx::reduceIncr4ReturnSum (double *m, SimdDouble v0, SimdDouble v1, SimdDouble v2, SimdDouble v3) |
Reduce each of four SIMD doubles, add those values to four consecutive doubles in memory, return sum. More... | |
Higher-level SIMD utilities accessing partial (half-width) SIMD doubles. | |
See the single-precision versions for documentation. Since double precision is typically half the width of single, this double version is likely only useful with 512-bit and larger implementations. | |
static SimdDouble gmx_simdcall | gmx::loadDualHsimd (const double *m0, const double *m1) |
Load low & high parts of SIMD double from different locations. More... | |
static SimdDouble gmx_simdcall | gmx::loadDuplicateHsimd (const double *m) |
Load half-SIMD-width double data, spread to both halves. More... | |
static SimdDouble gmx_simdcall | gmx::loadU1DualHsimd (const double *m) |
Load two doubles, spread 1st in low half, 2nd in high half. More... | |
static void gmx_simdcall | gmx::storeDualHsimd (double *m0, double *m1, SimdDouble a) |
Store low & high parts of SIMD double to different locations. More... | |
static void gmx_simdcall | gmx::incrDualHsimd (double *m0, double *m1, SimdDouble a) |
Add each half of SIMD variable to separate memory adresses. More... | |
static void gmx_simdcall | gmx::decr3Hsimd (double *m, SimdDouble a0, SimdDouble a1, SimdDouble a2) |
Add the two halves of three SIMD doubles, subtract the sum from three half-SIMD-width consecutive doubles in memory. More... | |
template<int align> | |
static void gmx_simdcall | gmx::gatherLoadTransposeHsimd (const double *base0, const double *base1, std::int32_t offset[], SimdDouble *v0, SimdDouble *v1) |
Load 2 consecutive doubles from each of GMX_SIMD_DOUBLE_WIDTH/2 offsets, transpose into SIMD double (low half from base0, high from base1). More... | |
static double gmx_simdcall | gmx::reduceIncr4ReturnSumHsimd (double *m, SimdDouble v0, SimdDouble v1) |
Reduce the 4 half-SIMD-with doubles in 2 SIMD variables (sum halves), increment four consecutive doubles in memory, return sum. More... | |
static SimdDouble gmx_simdcall | gmx::loadUNDuplicate4 (const double *m) |
Load N doubles and duplicate them 4 times each. More... | |
static SimdDouble gmx_simdcall | gmx::load4DuplicateN (const double *m) |
Load 4 doubles and duplicate them N times each. More... | |
static SimdDouble gmx_simdcall | gmx::loadU4NOffset (const double *m, int offset) |
Load doubles in blocks of 4 at fixed offsets. More... | |
Higher-level SIMD utility functions, single precision. | |
These include generic functions to work with triplets of data, typically coordinates, and a few utility functions to load and update data in the nonbonded kernels. These functions should be available on all implementations, although some wide SIMD implementations (width>=8) also provide special optional versions to work with half or quarter registers to improve the performance in the nonbonded kernels. | |
static const int | gmx::c_simdBestPairAlignmentFloat = 2 |
Best alignment to use for aligned pairs of float data. More... | |
template<int align> | |
static void gmx_simdcall | gmx::gatherLoadTranspose (const float *base, const std::int32_t offset[], SimdFloat *v0, SimdFloat *v1, SimdFloat *v2, SimdFloat *v3) |
Load 4 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets, and transpose into 4 SIMD float variables. More... | |
template<int align> | |
static void gmx_simdcall | gmx::gatherLoadTranspose (const float *base, const std::int32_t offset[], SimdFloat *v0, SimdFloat *v1) |
Load 2 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets, and transpose into 2 SIMD float variables. More... | |
template<int align> | |
static void gmx_simdcall | gmx::gatherLoadUTranspose (const float *base, const std::int32_t offset[], SimdFloat *v0, SimdFloat *v1, SimdFloat *v2) |
Load 3 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets, and transpose into 3 SIMD float variables. More... | |
template<int align> | |
static void gmx_simdcall | gmx::transposeScatterStoreU (float *base, const std::int32_t offset[], SimdFloat v0, SimdFloat v1, SimdFloat v2) |
Transpose and store 3 SIMD floats to 3 consecutive addresses at GMX_SIMD_FLOAT_WIDTH offsets. More... | |
template<int align> | |
static void gmx_simdcall | gmx::transposeScatterIncrU (float *base, const std::int32_t offset[], SimdFloat v0, SimdFloat v1, SimdFloat v2) |
Transpose and add 3 SIMD floats to 3 consecutive addresses at GMX_SIMD_FLOAT_WIDTH offsets. More... | |
template<int align> | |
static void gmx_simdcall | gmx::transposeScatterDecrU (float *base, const std::int32_t offset[], SimdFloat v0, SimdFloat v1, SimdFloat v2) |
Transpose and subtract 3 SIMD floats to 3 consecutive addresses at GMX_SIMD_FLOAT_WIDTH offsets. More... | |
static void gmx_simdcall | gmx::expandScalarsToTriplets (SimdFloat scalar, SimdFloat *triplets0, SimdFloat *triplets1, SimdFloat *triplets2) |
Expand each element of float SIMD variable into three identical consecutive elements in three SIMD outputs. More... | |
template<int align> | |
static void gmx_simdcall | gmx::gatherLoadBySimdIntTranspose (const float *base, SimdFInt32 offset, SimdFloat *v0, SimdFloat *v1, SimdFloat *v2, SimdFloat *v3) |
Load 4 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets specified by a SIMD integer, transpose into 4 SIMD float variables. More... | |
template<int align> | |
static void gmx_simdcall | gmx::gatherLoadUBySimdIntTranspose (const float *base, SimdFInt32 offset, SimdFloat *v0, SimdFloat *v1) |
Load 2 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets (unaligned) specified by SIMD integer, transpose into 2 SIMD floats. More... | |
template<int align> | |
static void gmx_simdcall | gmx::gatherLoadBySimdIntTranspose (const float *base, SimdFInt32 offset, SimdFloat *v0, SimdFloat *v1) |
Load 2 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets specified by a SIMD integer, transpose into 2 SIMD float variables. More... | |
static float gmx_simdcall | gmx::reduceIncr4ReturnSum (float *m, SimdFloat v0, SimdFloat v1, SimdFloat v2, SimdFloat v3) |
Reduce each of four SIMD floats, add those values to four consecutive floats in memory, return sum. More... | |
Higher-level SIMD utilities accessing partial (half-width) SIMD floats. | |
These functions are optional. The are only useful for SIMD implementation where the width is 8 or larger, and where it would be inefficient to process 4*8, 8*8, or more, interactions in parallel. Currently, only Intel provides very wide SIMD implementations, but these also come with excellent support for loading, storing, accessing and shuffling parts of the register in so-called 'lanes' of 4 bytes each. We can use this to load separate parts into the low/high halves of the register in the inner loop of the nonbonded kernel, which e.g. makes it possible to process 4*4 nonbonded interactions as a pattern of 2*8. We can also use implementations with width 16 or greater. To make this more generic, when GMX_SIMD_HAVE_HSIMD_UTIL_REAL is 1, the SIMD implementation provides seven special routines that:
Remember: this is ONLY used when the native SIMD width is large. You will just waste time if you implement it for normal 16-byte SIMD architectures. This is part of the new C++ SIMD interface, so these functions are only available when using C++. Since some Gromacs code reliying on the SIMD module is still C (not C++), we have kept the C-style naming for now - this will change once we are entirely C++. | |
static SimdFloat gmx_simdcall | gmx::loadDualHsimd (const float *m0, const float *m1) |
Load low & high parts of SIMD float from different locations. More... | |
static SimdFloat gmx_simdcall | gmx::loadDuplicateHsimd (const float *m) |
Load half-SIMD-width float data, spread to both halves. More... | |
static SimdFloat gmx_simdcall | gmx::loadU1DualHsimd (const float *m) |
Load two floats, spread 1st in low half, 2nd in high half. More... | |
static void gmx_simdcall | gmx::storeDualHsimd (float *m0, float *m1, SimdFloat a) |
Store low & high parts of SIMD float to different locations. More... | |
static void gmx_simdcall | gmx::incrDualHsimd (float *m0, float *m1, SimdFloat a) |
Add each half of SIMD variable to separate memory adresses. More... | |
static void gmx_simdcall | gmx::decr3Hsimd (float *m, SimdFloat a0, SimdFloat a1, SimdFloat a2) |
Add the two halves of three SIMD floats, subtract the sum from three half-SIMD-width consecutive floats in memory. More... | |
template<int align> | |
static void gmx_simdcall | gmx::gatherLoadTransposeHsimd (const float *base0, const float *base1, const std::int32_t offset[], SimdFloat *v0, SimdFloat *v1) |
Load 2 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH/2 offsets, transpose into SIMD float (low half from base0, high from base1). More... | |
static float gmx_simdcall | gmx::reduceIncr4ReturnSumHsimd (float *m, SimdFloat v0, SimdFloat v1) |
Reduce the 4 half-SIMD-with floats in 2 SIMD variables (sum halves), increment four consecutive floats in memory, return sum. More... | |
static SimdFloat gmx_simdcall | gmx::loadUNDuplicate4 (const float *m) |
Load N floats and duplicate them 4 times each. More... | |
static SimdFloat gmx_simdcall | gmx::load4DuplicateN (const float *m) |
Load 4 floats and duplicate them N times each. More... | |
static SimdFloat gmx_simdcall | gmx::loadU4NOffset (const float *m, int offset) |
Load floats in blocks of 4 at fixed offsets. More... | |
SIMD predefined macros to describe high-level capabilities | |
These macros are used to describe the features available in default Gromacs real precision. They are set from the lower-level implementation files that have macros describing single and double precision individually, as well as the implementation details. | |
#define | GMX_SIMD_HAVE_REAL GMX_SIMD_HAVE_FLOAT |
1 if SimdReal is available, otherwise 0. More... | |
#define | GMX_SIMD_REAL_WIDTH GMX_SIMD_FLOAT_WIDTH |
Width of SimdReal. More... | |
#define | GMX_SIMD_HAVE_INT32_EXTRACT GMX_SIMD_HAVE_FINT32_EXTRACT |
1 if support is available for extracting elements from SimdInt32, otherwise 0 More... | |
#define | GMX_SIMD_HAVE_INT32_LOGICAL GMX_SIMD_HAVE_FINT32_LOGICAL |
1 if logical ops are supported on SimdInt32, otherwise 0. More... | |
#define | GMX_SIMD_HAVE_INT32_ARITHMETICS GMX_SIMD_HAVE_FINT32_ARITHMETICS |
1 if arithmetic ops are supported on SimdInt32, otherwise 0. More... | |
#define | GMX_SIMD_HAVE_GATHER_LOADU_BYSIMDINT_TRANSPOSE_REAL GMX_SIMD_HAVE_GATHER_LOADU_BYSIMDINT_TRANSPOSE_FLOAT |
1 if gmx::simdGatherLoadUBySimdIntTranspose is present, otherwise 0 More... | |
#define | GMX_SIMD_HAVE_HSIMD_UTIL_REAL GMX_SIMD_HAVE_HSIMD_UTIL_FLOAT |
1 if real half-register load/store/reduce utils present, otherwise 0 More... | |
#define | GMX_SIMD4_HAVE_REAL GMX_SIMD4_HAVE_FLOAT |
1 if Simd4Real is available, otherwise 0. More... | |
Classes | |
class | gmx::Simd4Double |
SIMD4 double type. More... | |
class | gmx::Simd4DBool |
SIMD4 variable type to use for logical comparisons on doubles. More... | |
class | gmx::Simd4Float |
SIMD4 float type. More... | |
class | gmx::Simd4FBool |
SIMD4 variable type to use for logical comparisons on floats. More... | |
class | gmx::SimdDouble |
Double SIMD variable. Available if GMX_SIMD_HAVE_DOUBLE is 1. More... | |
class | gmx::SimdDInt32 |
Integer SIMD variable type to use for conversions to/from double. More... | |
class | gmx::SimdDBool |
Boolean type for double SIMD data. More... | |
class | gmx::SimdDIBool |
Boolean type for integer datatypes corresponding to double SIMD. More... | |
class | gmx::SimdFloat |
Float SIMD variable. Available if GMX_SIMD_HAVE_FLOAT is 1. More... | |
class | gmx::SimdFInt32 |
Integer SIMD variable type to use for conversions to/from float. More... | |
class | gmx::SimdFBool |
Boolean type for float SIMD data. More... | |
class | gmx::SimdFIBool |
Boolean type for integer datatypes corresponding to float SIMD. More... | |
Directories | |
directory | simd |
SIMD intrinsics interface (simd) | |
directory | tests |
Unit tests for SIMD intrinsics interface (simd). | |
Files | |
file | simd_support.h |
Functions to query compiled and supported SIMD architectures. | |
file | hsimd_declarations.h |
Declares all Hsimd functions that are not supported. | |
file | impl_reference.h |
Reference SIMD implementation, including SIMD documentation. | |
file | impl_reference_definitions.h |
Reference SIMD implementation, including SIMD documentation. | |
file | impl_reference_general.h |
Reference SIMD implementation, general utility functions. | |
file | impl_reference_simd4_double.h |
Reference implementation, SIMD4 single precision. | |
file | impl_reference_simd4_float.h |
Reference implementation, SIMD4 single precision. | |
file | impl_reference_simd_double.h |
Reference implementation, SIMD double precision. | |
file | impl_reference_simd_float.h |
Reference implementation, SIMD single precision. | |
file | impl_reference_util_double.h |
Reference impl., higher-level double prec. SIMD utility functions. | |
file | impl_reference_util_float.h |
Reference impl., higher-level single prec. SIMD utility functions. | |
file | scalar.h |
Scalar float functions corresponding to GROMACS SIMD functions. | |
file | scalar_math.h |
Scalar math functions mimicking GROMACS SIMD math functions. | |
file | scalar_util.h |
Scalar utility functions mimicking GROMACS SIMD utility functions. | |
file | simd.h |
Definitions, capabilities, and wrappers for SIMD module. | |
file | simd_math.h |
Math functions for SIMD datatypes. | |
file | simd_memory.h |
Declares SimdArrayRef. | |
file | vector_operations.h |
SIMD operations corresponding to Gromacs rvec datatypes. | |
#define GMX_SIMD4_HAVE_REAL GMX_SIMD4_HAVE_FLOAT |
1 if Simd4Real is available, otherwise 0.
GMX_SIMD4_HAVE_DOUBLE if GMX_DOUBLE is 1, otherwise GMX_SIMD4_HAVE_FLOAT.
#define GMX_SIMD_HAVE_GATHER_LOADU_BYSIMDINT_TRANSPOSE_REAL GMX_SIMD_HAVE_GATHER_LOADU_BYSIMDINT_TRANSPOSE_FLOAT |
1 if gmx::simdGatherLoadUBySimdIntTranspose is present, otherwise 0
GMX_SIMD_HAVE_GATHER_LOADU_BYSIMDINT_TRANSPOSE_DOUBLE if GMX_DOUBLE is 1, otherwise GMX_SIMD_HAVE_GATHER_LOADU_BYSIMDINT_TRANSPOSE_FLOAT.
#define GMX_SIMD_HAVE_HSIMD_UTIL_REAL GMX_SIMD_HAVE_HSIMD_UTIL_FLOAT |
1 if real half-register load/store/reduce utils present, otherwise 0
GMX_SIMD_HAVE_HSIMD_UTIL_DOUBLE if GMX_DOUBLE is 1, otherwise GMX_SIMD_HAVE_HSIMD_UTIL_FLOAT.
#define GMX_SIMD_HAVE_INT32_ARITHMETICS GMX_SIMD_HAVE_FINT32_ARITHMETICS |
1 if arithmetic ops are supported on SimdInt32, otherwise 0.
GMX_SIMD_HAVE_DINT32_ARITHMETICS if GMX_DOUBLE is 1, otherwise GMX_SIMD_HAVE_FINT32_ARITHMETICS.
#define GMX_SIMD_HAVE_INT32_EXTRACT GMX_SIMD_HAVE_FINT32_EXTRACT |
1 if support is available for extracting elements from SimdInt32, otherwise 0
GMX_SIMD_HAVE_DINT32_EXTRACT if GMX_DOUBLE is 1, otherwise GMX_SIMD_HAVE_FINT32_EXTRACT.
#define GMX_SIMD_HAVE_INT32_LOGICAL GMX_SIMD_HAVE_FINT32_LOGICAL |
1 if logical ops are supported on SimdInt32, otherwise 0.
GMX_SIMD_HAVE_DINT32_LOGICAL if GMX_DOUBLE is 1, otherwise GMX_SIMD_HAVE_FINT32_LOGICAL.
#define GMX_SIMD_HAVE_REAL GMX_SIMD_HAVE_FLOAT |
1 if SimdReal is available, otherwise 0.
GMX_SIMD_HAVE_DOUBLE if GMX_DOUBLE is 1, otherwise GMX_SIMD_HAVE_FLOAT.
#define GMX_SIMD_REAL_WIDTH GMX_SIMD_FLOAT_WIDTH |
Width of SimdReal.
GMX_SIMD_DOUBLE_WIDTH if GMX_DOUBLE is 1, otherwise GMX_SIMD_FLOAT_WIDTH.
|
inlinestatic |
SIMD4 Floating-point fabs().
a | any floating point values |
|
inlinestatic |
|
inlinestatic |
SIMD float Floating-point abs().
a | any floating point values |
|
inlinestatic |
SIMD double floating-point fabs().
a | any floating point values |
|
inlinestatic |
Bitwise andnot for two SIMD4 double variables. c=(~a) & b.
Available if GMX_SIMD_HAVE_LOGICAL is 1.
a | data1 |
b | data2 |
|
inlinestatic |
Bitwise andnot for two SIMD4 float variables. c=(~a) & b.
Available if GMX_SIMD_HAVE_LOGICAL is 1.
a | data1 |
b | data2 |
|
inlinestatic |
Bitwise andnot for SIMD float.
Available if GMX_SIMD_HAVE_LOGICAL is 1.
a | data1 |
b | data2 |
|
inlinestatic |
Bitwise andnot for SIMD double.
Available if GMX_SIMD_HAVE_LOGICAL is 1.
a | data1 |
b | data2 |
|
inlinestatic |
Integer SIMD bitwise not/complement.
Available if GMX_SIMD_HAVE_FINT32_LOGICAL is 1.
a | integer SIMD |
b | integer SIMD |
|
inlinestatic |
Integer SIMD bitwise not/complement.
Available if GMX_SIMD_HAVE_DINT32_LOGICAL is 1.
a | integer SIMD |
b | integer SIMD |
|
inlinestatic |
Returns non-zero if any of the boolean in SIMD4 a is True, otherwise 0.
a | Logical variable. |
The actual return value for truth will depend on the architecture, so any non-zero value is considered truth.
|
inlinestatic |
Returns non-zero if any of the boolean in SIMD4 a is True, otherwise 0.
a | Logical variable. |
The actual return value for truth will depend on the architecture, so any non-zero value is considered truth.
|
inlinestatic |
Returns non-zero if any of the boolean in SIMD a is True, otherwise 0.
a | Logical variable. |
The actual return value for truth will depend on the architecture, so any non-zero value is considered truth.
|
inlinestatic |
Returns non-zero if any of the boolean in SIMD a is True, otherwise 0.
a | Logical variable. |
The actual return value for truth will depend on the architecture, so any non-zero value is considered truth.
|
inlinestatic |
Returns true if any of the boolean in x is True, otherwise 0.
Available if GMX_SIMD_HAVE_FINT32_ARITHMETICS is 1.
The actual return value for "any true" will depend on the architecture. Any non-zero value should be considered truth.
a | SIMD boolean |
|
inlinestatic |
Returns true if any of the boolean in x is True, otherwise 0.
Available if GMX_SIMD_HAVE_DINT32_ARITHMETICS is 1.
The actual return value for "any true" will depend on the architecture. Any non-zero value should be considered truth.
a | SIMD boolean |
|
inlinestatic |
Vector-blend SIMD4 selection.
a | First source |
b | Second source |
sel | Boolean selector |
|
inlinestatic |
Vector-blend SIMD4 selection.
a | First source |
b | Second source |
sel | Boolean selector |
|
inlinestatic |
Vector-blend SIMD float selection.
a | First source |
b | Second source |
sel | Boolean selector |
|
inlinestatic |
Vector-blend SIMD double selection.
a | First source |
b | Second source |
sel | Boolean selector |
|
inlinestatic |
Vector-blend SIMD integer selection.
Available if GMX_SIMD_HAVE_FINT32_ARITHMETICS is 1.
a | First source |
b | Second source |
sel | Boolean selector |
|
inlinestatic |
Vector-blend SIMD integer selection.
Available if GMX_SIMD_HAVE_DINT32_ARITHMETICS is 1.
a | First source |
b | Second source |
sel | Boolean selector |
|
inlinestatic |
Convert from single precision boolean to corresponding integer boolean.
a | SIMD floating-point boolean |
|
inlinestatic |
Convert from double precision boolean to corresponding integer boolean.
a | SIMD floating-point boolean |
|
inlinestatic |
Convert SIMD double to float.
This version is available if GMX_SIMD_FLOAT_WIDTH is identical to GMX_SIMD_DOUBLE_WIDTH.
Float/double conversions are complex since the SIMD width could either be different (e.g. on x86) or identical (e.g. IBM QPX). This means you will need to check for the width in the code, and have different code paths.
d | Double-precision SIMD variable |
|
inlinestatic |
Convert SIMD double to float.
This version is available if GMX_SIMD_FLOAT_WIDTH is twice as large as GMX_SIMD_DOUBLE_WIDTH.
Float/double conversions are complex since the SIMD width could either be different (e.g. on x86) or identical (e.g. IBM QPX). This means you will need to check for the width in the code, and have different code paths.
d0 | Double-precision SIMD variable, first half of values to put in f. |
d1 | Double-precision SIMD variable, second half of values to put in f. |
|
inlinestatic |
Convert SIMD float to double.
This version is available if GMX_SIMD_FLOAT_WIDTH is identical to GMX_SIMD_DOUBLE_WIDTH.
Float/double conversions are complex since the SIMD width could either be different (e.g. on x86) or identical (e.g. IBM QPX). This means you will need to check for the width in the code, and have different code paths.
f | Single-precision SIMD variable |
|
inlinestatic |
Convert SIMD float to double.
This version is available if GMX_SIMD_FLOAT_WIDTH is twice as large as GMX_SIMD_DOUBLE_WIDTH.
Float/double conversions are complex since the SIMD width could either be different (e.g. on x86) or identical (e.g. IBM QPX). This means you will need to check for the width in the code, and have different code paths.
f | Single-precision SIMD variable | |
[out] | d0 | Double-precision SIMD variable, first half of values from f. |
[out] | d1 | Double-precision SIMD variable, second half of values from f. |
|
inlinestatic |
Convert integer to single precision floating point.
a | SIMD integer |
|
inlinestatic |
Convert integer to double precision floating point.
a | SIMD integer |
|
inlinestatic |
Convert from integer boolean to corresponding single precision boolean.
a | SIMD integer boolean |
|
inlinestatic |
Convert from integer boolean to corresponding double precision boolean.
a | SIMD integer boolean |
|
inlinestatic |
Round single precision floating point to integer.
a | SIMD floating-point |
|
inlinestatic |
Round double precision floating point to integer.
a | SIMD floating-point |
|
inlinestatic |
Truncate single precision floating point to integer.
a | SIMD floating-point |
|
inlinestatic |
Truncate double precision floating point to integer.
a | SIMD floating-point |
|
inlinestatic |
Add the two halves of three SIMD doubles, subtract the sum from three half-SIMD-width consecutive doubles in memory.
m | half-width aligned memory, from which sum of the halves will be subtracted. |
a0 | SIMD variable. Upper & lower halves will first be added. |
a1 | SIMD variable. Upper & lower halves will second be added. |
a2 | SIMD variable. Upper & lower halves will third be added. |
If the SIMD width is 8 and the vectors contain [a0 b0 c0 d0 e0 f0 g0 h0], [a1 b1 c1 d1 e1 f1 g1 g1] and [a2 b2 c2 d2 e2 f2 g2 h2], the memory will be modified to [m[0]-(a0+e0) m[1]-(b0+f0) m[2]-(c0+g0) m[3]-(d0+h0) m[4]-(a1+e1) m[5]-(b1+f1) m[6]-(c1+g1) m[7]-(d1+h1) m[8]-(a2+e2) m[9]-(b2+f2) m[10]-(c2+g2) m[11]-(d2+h2)].
The memory must be aligned to half SIMD width.
Available if GMX_SIMD_HAVE_HSIMD_UTIL_DOUBLE is 1.
|
inlinestatic |
Add the two halves of three SIMD floats, subtract the sum from three half-SIMD-width consecutive floats in memory.
m | half-width aligned memory, from which sum of the halves will be subtracted. |
a0 | SIMD variable. Upper & lower halves will first be added. |
a1 | SIMD variable. Upper & lower halves will second be added. |
a2 | SIMD variable. Upper & lower halves will third be added. |
If the SIMD width is 8 and the vectors contain [a0 b0 c0 d0 e0 f0 g0 h0], [a1 b1 c1 d1 e1 f1 g1 g1] and [a2 b2 c2 d2 e2 f2 g2 h2], the memory will be modified to [m[0]-(a0+e0) m[1]-(b0+f0) m[2]-(c0+g0) m[3]-(d0+h0) m[4]-(a1+e1) m[5]-(b1+f1) m[6]-(c1+g1) m[7]-(d1+h1) m[8]-(a2+e2) m[9]-(b2+f2) m[10]-(c2+g2) m[11]-(d2+h2)].
The memory must be aligned to half SIMD width.
Available if GMX_SIMD_HAVE_HSIMD_UTIL_FLOAT is 1.
|
inlinestatic |
Return dot product of two single precision SIMD4 variables.
The dot product is calculated between the first three elements in the two vectors, while the fourth is ignored. The result is returned as a scalar.
a | vector1 |
b | vector2 |
|
inlinestatic |
Return dot product of two double precision SIMD4 variables.
The dot product is calculated between the first three elements in the two vectors, while the fourth is ignored. The result is returned as a scalar.
a | vector1 |
b | vector2 |
|
inlinestatic |
Expand each element of double SIMD variable into three identical consecutive elements in three SIMD outputs.
scalar | Floating-point input, e.g. [s0 s1 s2 s3] if width=4. | |
[out] | triplets0 | First output, e.g. [s0 s0 s0 s1] if width=4. |
[out] | triplets1 | Second output, e.g. [s1 s1 s2 s2] if width=4. |
[out] | triplets2 | Third output, e.g. [s2 s3 s3 s3] if width=4. |
This routine is meant to use for things like scalar-vector multiplication, where the vectors are stored in a merged format like [x0 y0 z0 x1 y1 z1 ...], while the scalars are stored as [s0 s1 s2...], and the data cannot easily be changed to SIMD-friendly layout.
In this case, load 3 full-width SIMD variables from the vector array (This will always correspond to GMX_SIMD_DOUBLE_WIDTH triplets), load a single full-width variable from the scalar array, and call this routine to expand the data. You can then simply multiply the first, second and third pair of SIMD variables, and store the three results back into a suitable vector-format array.
|
inlinestatic |
Expand each element of float SIMD variable into three identical consecutive elements in three SIMD outputs.
scalar | Floating-point input, e.g. [s0 s1 s2 s3] if width=4. | |
[out] | triplets0 | First output, e.g. [s0 s0 s0 s1] if width=4. |
[out] | triplets1 | Second output, e.g. [s1 s1 s2 s2] if width=4. |
[out] | triplets2 | Third output, e.g. [s2 s3 s3 s3] if width=4. |
This routine is meant to use for things like scalar-vector multiplication, where the vectors are stored in a merged format like [x0 y0 z0 x1 y1 z1 ...], while the scalars are stored as [s0 s1 s2...], and the data cannot easily be changed to SIMD-friendly layout.
In this case, load 3 full-width SIMD variables from the vector array (This will always correspond to GMX_SIMD_FLOAT_WIDTH triplets), load a single full-width variable from the scalar array, and call this routine to expand the data. You can then simply multiply the first, second and third pair of SIMD variables, and store the three results back into a suitable vector-format array.
|
inlinestatic |
Extract element with index i from gmx::SimdFInt32.
Available if GMX_SIMD_HAVE_FINT32_EXTRACT is 1.
index | Compile-time constant, position to extract (first position is 0) |
a | SIMD variable from which to extract value. |
|
inlinestatic |
Extract element with index i from gmx::SimdDInt32.
Available if GMX_SIMD_HAVE_DINT32_EXTRACT is 1.
index | Compile-time constant, position to extract (first position is 0) |
a | SIMD variable from which to extract value. |
|
inlinestatic |
SIMD4 Fused-multiply-add. Result is a*b+c.
a | factor1 |
b | factor2 |
c | term |
|
inlinestatic |
SIMD4 Fused-multiply-add. Result is a*b+c.
a | factor1 |
b | factor2 |
c | term |
|
inlinestatic |
SIMD float Fused-multiply-add. Result is a*b+c.
a | factor1 |
b | factor2 |
c | term |
|
inlinestatic |
SIMD double Fused-multiply-add. Result is a*b+c.
a | factor1 |
b | factor2 |
c | term |
|
inlinestatic |
SIMD4 Fused-multiply-subtract. Result is a*b-c.
a | factor1 |
b | factor2 |
c | term |
|
inlinestatic |
SIMD4 Fused-multiply-subtract. Result is a*b-c.
a | factor1 |
b | factor2 |
c | term |
|
inlinestatic |
SIMD float Fused-multiply-subtract. Result is a*b-c.
a | factor1 |
b | factor2 |
c | term |
|
inlinestatic |
SIMD double Fused-multiply-subtract. Result is a*b-c.
a | factor1 |
b | factor2 |
c | term |
|
inlinestatic |
SIMD4 Fused-negated-multiply-add. Result is -a*b+c.
a | factor1 |
b | factor2 |
c | term |
|
inlinestatic |
SIMD4 Fused-negated-multiply-add. Result is -a*b+c.
a | factor1 |
b | factor2 |
c | term |
|
inlinestatic |
SIMD float Fused-negated-multiply-add. Result is -a*b+c.
a | factor1 |
b | factor2 |
c | term |
|
inlinestatic |
SIMD double Fused-negated-multiply-add. Result is -a*b+c.
a | factor1 |
b | factor2 |
c | term |
|
inlinestatic |
SIMD4 Fused-negated-multiply-subtract. Result is -a*b-c.
a | factor1 |
b | factor2 |
c | term |
|
inlinestatic |
SIMD4 Fused-negated-multiply-subtract. Result is -a*b-c.
a | factor1 |
b | factor2 |
c | term |
|
inlinestatic |
SIMD float Fused-negated-multiply-subtract. Result is -a*b-c.
a | factor1 |
b | factor2 |
c | term |
|
inlinestatic |
SIMD double Fused-negated-multiply-subtract. Result is -a*b-c.
a | factor1 |
b | factor2 |
c | term |
|
inlinestatic |
Extract (integer) exponent and fraction from single precision SIMD.
opt | By default this function behaves like the standard library such that frexp(+-0,exp) returns +-0 and stores 0 in the exponent when value is 0. If you know the argument is always nonzero, you can set the template parameter to MathOptimization::Unsafe to make it slightly faster. |
value | Floating-point value to extract from | |
[out] | exponent | Returned exponent of value, integer SIMD format. |
|
inlinestatic |
Extract (integer) exponent and fraction from double precision SIMD.
opt | By default this function behaves like the standard library such that frexp(+-0,exp) returns +-0 and stores 0 in the exponent when value is 0. If you know the argument is always nonzero, you can set the template parameter to MathOptimization::Unsafe to make it slightly faster. |
value | Floating-point value to extract from | |
[out] | exponent | Returned exponent of value, integer SIMD format. |
|
inlinestatic |
Load 4 consecutive doubles from each of GMX_SIMD_DOUBLE_WIDTH offsets specified by a SIMD integer, transpose into 4 SIMD double variables.
align | Alignment of the memory from which we read, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 4 for this routine) the input data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are loaded. |
base | Aligned pointer to the start of the memory. | |
offset | SIMD integer type with offsets to the start of each triplet. | |
[out] | v0 | First component, base[align*offset[i]] for each i. |
[out] | v1 | Second component, base[align*offset[i] + 1] for each i. |
[out] | v2 | Third component, base[align*offset[i] + 2] for each i. |
[out] | v3 | Fourth component, base[align*offset[i] + 3] for each i. |
The floating-point memory locations must be aligned, but only to the smaller of four elements and the floating-point SIMD width.
|
inlinestatic |
Load 4 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets specified by a SIMD integer, transpose into 4 SIMD float variables.
align | Alignment of the memory from which we read, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 4 for this routine) the input data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are loaded. |
base | Aligned pointer to the start of the memory. | |
offset | SIMD integer type with offsets to the start of each triplet. | |
[out] | v0 | First component, base[align*offset[i]] for each i. |
[out] | v1 | Second component, base[align*offset[i] + 1] for each i. |
[out] | v2 | Third component, base[align*offset[i] + 2] for each i. |
[out] | v3 | Fourth component, base[align*offset[i] + 3] for each i. |
The floating-point memory locations must be aligned, but only to the smaller of four elements and the floating-point SIMD width.
|
inlinestatic |
Load 2 consecutive doubles from each of GMX_SIMD_DOUBLE_WIDTH offsets specified by a SIMD integer, transpose into 2 SIMD double variables.
align | Alignment of the memory from which we read, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 2 for this routine) the input data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are loaded. |
base | Aligned pointer to the start of the memory. | |
offset | SIMD integer type with offsets to the start of each triplet. | |
[out] | v0 | First component, base[align*offset[i]] for each i. |
[out] | v1 | Second component, base[align*offset[i] + 1] for each i. |
The floating-point memory locations must be aligned, but only to the smaller of two elements and the floating-point SIMD width.
|
inlinestatic |
Load 2 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets specified by a SIMD integer, transpose into 2 SIMD float variables.
align | Alignment of the memory from which we read, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 2 for this routine) the input data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are loaded. |
base | Aligned pointer to the start of the memory. | |
offset | SIMD integer type with offsets to the start of each triplet. | |
[out] | v0 | First component, base[align*offset[i]] for each i. |
[out] | v1 | Second component, base[align*offset[i] + 1] for each i. |
The floating-point memory locations must be aligned, but only to the smaller of two elements and the floating-point SIMD width.
|
inlinestatic |
Load 4 consecutive double from each of GMX_SIMD_DOUBLE_WIDTH offsets, and transpose into 4 SIMD double variables.
align | Alignment of the memory from which we read, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 4 for this routine) the input data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are loaded. |
base | Pointer to the start of the memory area | |
offset | Array with offsets to the start of each data point. | |
[out] | v0 | 1st component of data, base[align*offset[i]] for each i. |
[out] | v1 | 2nd component of data, base[align*offset[i] + 1] for each i. |
[out] | v2 | 3rd component of data, base[align*offset[i] + 2] for each i. |
[out] | v3 | 4th component of data, base[align*offset[i] + 3] for each i. |
The floating-point memory locations must be aligned, but only to the smaller of four elements and the floating-point SIMD width.
The offset memory must be aligned to GMX_SIMD_DINT32_WIDTH.
|
inlinestatic |
Load 4 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets, and transpose into 4 SIMD float variables.
align | Alignment of the memory from which we read, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 4 for this routine) the input data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are loaded. |
base | Pointer to the start of the memory area | |
offset | Array with offsets to the start of each data point. | |
[out] | v0 | 1st component of data, base[align*offset[i]] for each i. |
[out] | v1 | 2nd component of data, base[align*offset[i] + 1] for each i. |
[out] | v2 | 3rd component of data, base[align*offset[i] + 2] for each i. |
[out] | v3 | 4th component of data, base[align*offset[i] + 3] for each i. |
The floating-point memory locations must be aligned, but only to the smaller of four elements and the floating-point SIMD width.
The offset memory must be aligned to GMX_SIMD_DINT32_WIDTH.
|
inlinestatic |
Load 2 consecutive double from each of GMX_SIMD_DOUBLE_WIDTH offsets, and transpose into 2 SIMD double variables.
align | Alignment of the memory from which we read, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 2 for this routine) the input data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are loaded. |
base | Pointer to the start of the memory area | |
offset | Array with offsets to the start of each data point. | |
[out] | v0 | 1st component of data, base[align*offset[i]] for each i. |
[out] | v1 | 2nd component of data, base[align*offset[i] + 1] for each i. |
The floating-point memory locations must be aligned, but only to the smaller of two elements and the floating-point SIMD width.
The offset memory must be aligned to GMX_SIMD_DINT32_WIDTH.
|
inlinestatic |
Load 2 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets, and transpose into 2 SIMD float variables.
align | Alignment of the memory from which we read, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 2 for this routine) the input data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are loaded. |
base | Pointer to the start of the memory area | |
offset | Array with offsets to the start of each data point. | |
[out] | v0 | 1st component of data, base[align*offset[i]] for each i. |
[out] | v1 | 2nd component of data, base[align*offset[i] + 1] for each i. |
The floating-point memory locations must be aligned, but only to the smaller of two elements and the floating-point SIMD width.
The offset memory must be aligned to GMX_SIMD_FINT32_WIDTH.
To achieve the best possible performance, you should store your data with alignment c_simdBestPairAlignmentFloat in single, or c_simdBestPairAlignmentDouble in double.
|
inlinestatic |
Load 2 consecutive doubles from each of GMX_SIMD_DOUBLE_WIDTH/2 offsets, transpose into SIMD double (low half from base0, high from base1).
align | Alignment of the storage, i.e. the distance (measured in elements, not bytes) between index points. When this is identical to the number of output components the data is packed without padding. This must be a multiple of the alignment to keep all data aligned. |
base0 | Pointer to base of first aligned memory | |
base1 | Pointer to base of second aligned memory | |
offset | Offset to the start of each pair | |
[out] | v0 | 1st element in each pair, base0 in low and base1 in high half. |
[out] | v1 | 2nd element in each pair, base0 in low and base1 in high half. |
The offset array should be of half the SIMD width length, so it corresponds to the half-SIMD-register operations. This also means it must be aligned to half the integer SIMD width (i.e., GMX_SIMD_DINT32_WIDTH/2).
The floating-point memory locations must be aligned, but only to the smaller of two elements and the floating-point SIMD width.
This routine is primarily designed to load nonbonded parameters in the kernels. It is the equivalent of the full-width routine gatherLoadTranspose(), but just as the other hsimd routines it will pick half-SIMD-width data from base0 and put in the lower half, while the upper half comes from base1.
For an example, assume the SIMD width is 8, align is 2, that base0 is [A0 A1 B0 B1 C0 C1 D0 D1 ...], and base1 [E0 E1 F0 F1 G0 G1 H0 H1...].
Then we will get v0 as [A0 B0 C0 D0 E0 F0 G0 H0] and v1 as [A1 B1 C1 D1 E1 F1 G1 H1].
Available if GMX_SIMD_HAVE_HSIMD_UTIL_DOUBLE is 1.
|
inlinestatic |
Load 2 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH/2 offsets, transpose into SIMD float (low half from base0, high from base1).
align | Alignment of the storage, i.e. the distance (measured in elements, not bytes) between index points. When this is identical to the number of output components the data is packed without padding. This must be a multiple of the alignment to keep all data aligned. |
base0 | Pointer to base of first aligned memory | |
base1 | Pointer to base of second aligned memory | |
offset | Offset to the start of each pair | |
[out] | v0 | 1st element in each pair, base0 in low and base1 in high half. |
[out] | v1 | 2nd element in each pair, base0 in low and base1 in high half. |
The offset array should be of half the SIMD width length, so it corresponds to the half-SIMD-register operations. This also means it must be aligned to half the integer SIMD width (i.e., GMX_SIMD_FINT32_WIDTH/2).
The floating-point memory locations must be aligned, but only to the smaller of two elements and the floating-point SIMD width.
This routine is primarily designed to load nonbonded parameters in the kernels. It is the equivalent of the full-width routine gatherLoadTranspose(), but just as the other hsimd routines it will pick half-SIMD-width data from base0 and put in the lower half, while the upper half comes from base1.
For an example, assume the SIMD width is 8, align is 2, that base0 is [A0 A1 B0 B1 C0 C1 D0 D1 ...], and base1 [E0 E1 F0 F1 G0 G1 H0 H1...].
Then we will get v0 as [A0 B0 C0 D0 E0 F0 G0 H0] and v1 as [A1 B1 C1 D1 E1 F1 G1 H1].
Available if GMX_SIMD_HAVE_HSIMD_UTIL_FLOAT is 1.
|
inlinestatic |
Load 2 consecutive doubles from each of GMX_SIMD_DOUBLE_WIDTH offsets (unaligned) specified by SIMD integer, transpose into 2 SIMD doubles.
align | Alignment of the memory from which we read, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 2 for this routine) the input data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are loaded. |
base | Pointer to the start of the memory. | |
offset | SIMD integer type with offsets to the start of each triplet. | |
[out] | v0 | First component, base[align*offset[i]] for each i. |
[out] | v1 | Second component, base[align*offset[i] + 1] for each i. |
Since some SIMD architectures cannot handle any unaligned loads, this routine is only available if GMX_SIMD_HAVE_GATHER_LOADU_BYSIMDINT_TRANSPOSE is 1.
|
inlinestatic |
Load 2 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets (unaligned) specified by SIMD integer, transpose into 2 SIMD floats.
align | Alignment of the memory from which we read, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 2 for this routine) the input data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are loaded. |
base | Pointer to the start of the memory. | |
offset | SIMD integer type with offsets to the start of each triplet. | |
[out] | v0 | First component, base[align*offset[i]] for each i. |
[out] | v1 | Second component, base[align*offset[i] + 1] for each i. |
Since some SIMD architectures cannot handle any unaligned loads, this routine is only available if GMX_SIMD_HAVE_GATHER_LOADU_BYSIMDINT_TRANSPOSE is 1.
|
inlinestatic |
Load 3 consecutive doubles from each of GMX_SIMD_DOUBLE_WIDTH offsets, and transpose into 3 SIMD double variables.
align | Alignment of the memory from which we read, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 3 for this routine) the input data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are loaded. |
base | Pointer to the start of the memory area | |
offset | Array with offsets to the start of each data point. | |
[out] | v0 | 1st component of data, base[align*offset[i]] for each i. |
[out] | v1 | 2nd component of data, base[align*offset[i] + 1] for each i. |
[out] | v2 | 3rd component of data, base[align*offset[i] + 2] for each i. |
This function can work with both aligned (better performance) and unaligned memory. When the align parameter is not a power-of-two (align==3 would be normal for packed atomic coordinates) the memory obviously cannot be aligned, and we account for this. However, in the case where align is a power-of-two, we assume the base pointer also has the same alignment, which will enable many platforms to use faster aligned memory load operations. An easy way to think of this is that each triplet of data in memory must be aligned to the align parameter you specify when it's a power-of-two.
The offset memory must always be aligned to GMX_SIMD_FINT32_WIDTH, since this enables us to use SIMD loads and gather operations on platforms that support it.
|
inlinestatic |
Load 3 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets, and transpose into 3 SIMD float variables.
align | Alignment of the memory from which we read, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 3 for this routine) the input data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are loaded. |
base | Pointer to the start of the memory area | |
offset | Array with offsets to the start of each data point. | |
[out] | v0 | 1st component of data, base[align*offset[i]] for each i. |
[out] | v1 | 2nd component of data, base[align*offset[i] + 1] for each i. |
[out] | v2 | 3rd component of data, base[align*offset[i] + 2] for each i. |
This function can work with both aligned (better performance) and unaligned memory. When the align parameter is not a power-of-two (align==3 would be normal for packed atomic coordinates) the memory obviously cannot be aligned, and we account for this. However, in the case where align is a power-of-two, we assume the base pointer also has the same alignment, which will enable many platforms to use faster aligned memory load operations. An easy way to think of this is that each triplet of data in memory must be aligned to the align parameter you specify when it's a power-of-two.
The offset memory must always be aligned to GMX_SIMD_FINT32_WIDTH, since this enables us to use SIMD loads and gather operations on platforms that support it.
|
inlinestatic |
Add each half of SIMD variable to separate memory adresses.
m0 | Pointer to memory aligned to half SIMD width. |
m1 | Pointer to memory aligned to half SIMD width. |
a | SIMD variable. Lower half will be added to m0, upper half to m1. |
The memory must be aligned to half SIMD width.
Available if GMX_SIMD_HAVE_HSIMD_UTIL_DOUBLE is 1.
|
inlinestatic |
Add each half of SIMD variable to separate memory adresses.
m0 | Pointer to memory aligned to half SIMD width. |
m1 | Pointer to memory aligned to half SIMD width. |
a | SIMD variable. Lower half will be added to m0, upper half to m1. |
The memory must be aligned to half SIMD width.
Available if GMX_SIMD_HAVE_HSIMD_UTIL_FLOAT is 1.
|
inlinestatic |
Multiply a SIMD float value by the number 2 raised to an exp power.
opt | By default, this routine will return zero for input arguments that are so small they cannot be reproduced in the current precision. If the unsafe math optimization template parameter setting is used, these tests are skipped, and the result will be undefined (possible even NaN). This might happen below -127 in single precision or -1023 in double, although some might use denormal support to extend the range. |
value | Floating-point number to multiply with new exponent |
exponent | Integer that will not overflow as 2^exponent. |
|
inlinestatic |
Multiply a SIMD double value by the number 2 raised to an exp power.
opt | By default, this routine will return zero for input arguments that are so small they cannot be reproduced in the current precision. If the unsafe math optimization template parameter setting is used, these tests are skipped, and the result will be undefined (possible even NaN). This might happen below -127 in single precision or -1023 in double, although some might use denormal support to extend the range. |
value | Floating-point number to multiply with new exponent |
exponent | Integer that will not overflow as 2^exponent. |
|
inlinestatic |
Load 4 float values from aligned memory into SIMD4 variable.
m | Pointer to memory aligned to 4 elements. |
|
inlinestatic |
Load 4 double values from aligned memory into SIMD4 variable.
m | Pointer to memory aligned to 4 elements. |
|
inlinestatic |
Load 4 doubles and duplicate them N times each.
m | Pointer to memory aligned to 4 doubles |
Available if GMX_SIMD_HAVE_4NSIMD_UTIL_DOUBLE is 1. N is GMX_SIMD_DOUBLE_WIDTH/4. Different values are contigous and same values are 4 positions in SIMD apart.
|
inlinestatic |
Load 4 floats and duplicate them N times each.
m | Pointer to memory aligned to 4 floats |
Available if GMX_SIMD_HAVE_4NSIMD_UTIL_FLOAT is 1. N is GMX_SIMD_FLOAT_WIDTH/4. Different values are contigous and same values are 4 positions in SIMD apart.
|
inlinestatic |
Load SIMD4 float from unaligned memory.
Available if GMX_SIMD_HAVE_LOADU is 1.
m | Pointer to memory, no alignment requirement. |
|
inlinestatic |
Load SIMD4 double from unaligned memory.
Available if GMX_SIMD_HAVE_LOADU is 1.
m | Pointer to memory, no alignment requirement. |
|
inlinestatic |
Load low & high parts of SIMD double from different locations.
m0 | Pointer to memory aligned to half SIMD width. |
m1 | Pointer to memory aligned to half SIMD width. |
Available if GMX_SIMD_HAVE_HSIMD_UTIL_DOUBLE is 1.
|
inlinestatic |
Load low & high parts of SIMD float from different locations.
m0 | Pointer to memory aligned to half SIMD width. |
m1 | Pointer to memory aligned to half SIMD width. |
Available if GMX_SIMD_HAVE_HSIMD_UTIL_FLOAT is 1.
|
inlinestatic |
Load half-SIMD-width double data, spread to both halves.
m | Pointer to memory aligned to half SIMD width. |
Available if GMX_SIMD_HAVE_HSIMD_UTIL_DOUBLE is 1.
|
inlinestatic |
Load half-SIMD-width float data, spread to both halves.
m | Pointer to memory aligned to half SIMD width. |
Available if GMX_SIMD_HAVE_HSIMD_UTIL_FLOAT is 1.
|
inlinestatic |
Load two doubles, spread 1st in low half, 2nd in high half.
m | Pointer to two adjacent double values. |
Available if GMX_SIMD_HAVE_HSIMD_UTIL_DOUBLE is 1.
|
inlinestatic |
Load two floats, spread 1st in low half, 2nd in high half.
m | Pointer to two adjacent float values. |
Available if GMX_SIMD_HAVE_HSIMD_UTIL_FLOAT is 1.
|
inlinestatic |
Load doubles in blocks of 4 at fixed offsets.
m | Pointer to unaligned memory |
offset | Offset in memory between input blocks of 4 |
Available if GMX_SIMD_HAVE_4NSIMD_UTIL_DOUBLE is 1. Blocks of 4 doubles are loaded from m+n*offset where n is the n-th block of 4 doubles.
|
inlinestatic |
Load floats in blocks of 4 at fixed offsets.
m | Pointer to unaligned memory |
offset | Offset in memory between input blocks of 4 |
Available if GMX_SIMD_HAVE_4NSIMD_UTIL_FLOAT is 1. Blocks of 4 floats are loaded from m+n*offset where n is the n-th block of 4 floats.
|
inlinestatic |
Load N doubles and duplicate them 4 times each.
m | Pointer to unaligned memory |
Available if GMX_SIMD_HAVE_4NSIMD_UTIL_DOUBLE is 1. N is GMX_SIMD_DOUBLE_WIDTH/4. Duplicated values are contigous and different values are 4 positions in SIMD apart.
|
inlinestatic |
Load N floats and duplicate them 4 times each.
m | Pointer to unaligned memory |
Available if GMX_SIMD_HAVE_4NSIMD_UTIL_FLOAT is 1. N is GMX_SIMD_FLOAT_WIDTH/4. Duplicated values are contigous and different values are 4 positions in SIMD apart.
|
inlinestatic |
Add two float SIMD variables, masked version.
a | term1 |
b | term2 |
m | mask |
|
inlinestatic |
Add two double SIMD variables, masked version.
a | term1 |
b | term2 |
m | mask |
|
inlinestatic |
SIMD float fused multiply-add, masked version.
a | factor1 |
b | factor2 |
c | term |
m | mask |
|
inlinestatic |
SIMD double fused multiply-add, masked version.
a | factor1 |
b | factor2 |
c | term |
m | mask |
|
inlinestatic |
Multiply two float SIMD variables, masked version.
a | factor1 |
b | factor2 |
m | mask |
|
inlinestatic |
Multiply two double SIMD variables, masked version.
a | factor1 |
b | factor2 |
m | mask |
|
inlinestatic |
SIMD float 1.0/x lookup, masked version.
This is a low-level instruction that should only be called from routines implementing the reciprocal in simd_math.h.
x | Argument, x>0 for entries where mask is true. |
m | Mask |
|
inlinestatic |
SIMD double 1.0/x lookup, masked version.
This is a low-level instruction that should only be called from routines implementing the reciprocal in simd_math.h.
x | Argument, x>0 for entries where mask is true. |
m | Mask |
|
inlinestatic |
SIMD float 1.0/sqrt(x) lookup, masked version.
This is a low-level instruction that should only be called from routines implementing the inverse square root in simd_math.h.
x | Argument, x>0 for entries where mask is true. |
m | Mask |
|
inlinestatic |
SIMD double 1.0/sqrt(x) lookup, masked version.
This is a low-level instruction that should only be called from routines implementing the inverse square root in simd_math.h.
x | Argument, x>0 for entries where mask is true. |
m | Mask |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
|
inlinestatic |
a!=b for SIMD4 float
a | value1 |
b | value2 |
|
inlinestatic |
a!=b for SIMD4 double
a | value1 |
b | value2 |
|
inlinestatic |
SIMD a!=b for single SIMD.
a | value1 |
b | value2 |
Beware that exact floating-point comparisons are difficult.
|
inlinestatic |
SIMD a!=b for double SIMD.
a | value1 |
b | value2 |
Beware that exact floating-point comparisons are difficult.
|
inlinestatic |
Bitwise and for two SIMD4 float variables.
Supported if GMX_SIMD_HAVE_LOGICAL is 1.
a | data1 |
b | data2 |
|
inlinestatic |
Bitwise and for two SIMD4 double variables.
Supported if GMX_SIMD_HAVE_LOGICAL is 1.
a | data1 |
b | data2 |
|
inlinestatic |
Bitwise and for two SIMD float variables.
Supported if GMX_SIMD_HAVE_LOGICAL is 1.
a | data1 |
b | data2 |
|
inlinestatic |
Bitwise and for two SIMD double variables.
Supported if GMX_SIMD_HAVE_LOGICAL is 1.
a | data1 |
b | data2 |
|
inlinestatic |
Integer SIMD bitwise and.
Available if GMX_SIMD_HAVE_FINT32_LOGICAL is 1.
a | first integer SIMD |
b | second integer SIMD |
|
inlinestatic |
Integer SIMD bitwise and.
Available if GMX_SIMD_HAVE_DINT32_LOGICAL is 1.
a | first integer SIMD |
b | second integer SIMD |
|
inlinestatic |
Logical and on single precision SIMD4 booleans.
a | logical vars 1 |
b | logical vars 2 |
|
inlinestatic |
Logical and on single precision SIMD4 booleans.
a | logical vars 1 |
b | logical vars 2 |
|
inlinestatic |
Logical and on single precision SIMD booleans.
a | logical vars 1 |
b | logical vars 2 |
|
inlinestatic |
Logical and on double precision SIMD booleans.
a | logical vars 1 |
b | logical vars 2 |
|
inlinestatic |
Logical AND on SimdFIBool.
Available if GMX_SIMD_HAVE_FINT32_ARITHMETICS is 1.
a | SIMD boolean 1 |
b | SIMD boolean 2 |
|
inlinestatic |
Logical AND on SimdDIBool.
Available if GMX_SIMD_HAVE_DINT32_ARITHMETICS is 1.
a | SIMD boolean 1 |
b | SIMD boolean 2 |
|
inlinestatic |
Multiply two SIMD4 variables.
a | factor1 |
b | factor2 |
|
inlinestatic |
Multiply two SIMD4 variables.
a | factor1 |
b | factor2 |
|
inlinestatic |
Multiply two float SIMD variables.
a | factor1 |
b | factor2 |
|
inlinestatic |
Multiply two double SIMD variables.
a | factor1 |
b | factor2 |
|
inlinestatic |
Multiply SIMD integers.
This routine is only available if GMX_SIMD_HAVE_FINT32_ARITHMETICS (single) or GMX_SIMD_HAVE_DINT32_ARITHMETICS (double) is 1.
a | factor1 |
b | factor2 |
|
inlinestatic |
Multiply SIMD integers.
Available if GMX_SIMD_HAVE_DINT32_ARITHMETICS is 1.
a | factor1 |
b | factor2 |
|
inlinestatic |
Add two double SIMD4 variables.
a | term1 |
b | term2 |
|
inlinestatic |
Add two float SIMD4 variables.
a | term1 |
b | term2 |
|
inlinestatic |
Add two float SIMD variables.
a | term1 |
b | term2 |
|
inlinestatic |
Add two double SIMD variables.
a | term1 |
b | term2 |
|
inlinestatic |
Add SIMD integers.
This routine is only available if GMX_SIMD_HAVE_FINT32_ARITHMETICS (single) or GMX_SIMD_HAVE_DINT32_ARITHMETICS (double) is 1.
a | term1 |
b | term2 |
|
inlinestatic |
Add SIMD integers.
Available if GMX_SIMD_HAVE_DINT32_ARITHMETICS is 1.
a | term1 |
b | term2 |
|
inlinestatic |
Subtract two SIMD4 variables.
a | term1 |
b | term2 |
|
inlinestatic |
Subtract two SIMD4 variables.
a | term1 |
b | term2 |
|
inlinestatic |
SIMD4 floating-point negate.
a | SIMD4 floating-point value |
|
inlinestatic |
SIMD4 floating-point negate.
a | SIMD4 floating-point value |
|
inlinestatic |
Subtract two float SIMD variables.
a | term1 |
b | term2 |
|
inlinestatic |
Subtract two double SIMD variables.
a | term1 |
b | term2 |
|
inlinestatic |
SIMD single precision negate.
a | SIMD double precision value |
|
inlinestatic |
SIMD double precision negate.
a | SIMD double precision value |
|
inlinestatic |
Subtract SIMD integers.
This routine is only available if GMX_SIMD_HAVE_FINT32_ARITHMETICS (single) or GMX_SIMD_HAVE_DINT32_ARITHMETICS (double) is 1.
a | term1 |
b | term2 |
|
inlinestatic |
Subtract SIMD integers.
Available if GMX_SIMD_HAVE_DINT32_ARITHMETICS is 1.
a | term1 |
b | term2 |
|
inlinestatic |
a<b for SIMD4 float
a | value1 |
b | value2 |
|
inlinestatic |
a<b for SIMD4 double
a | value1 |
b | value2 |
|
inlinestatic |
SIMD a<b for single SIMD.
a | value1 |
b | value2 |
|
inlinestatic |
SIMD a<b for double SIMD.
a | value1 |
b | value2 |
|
inlinestatic |
Less-than comparison of two SIMD integers corresponding to float values.
Available if GMX_SIMD_HAVE_FINT32_ARITHMETICS is 1.
a | SIMD integer1 |
b | SIMD integer2 |
|
inlinestatic |
Less-than comparison of two SIMD integers corresponding to double values.
Available if GMX_SIMD_HAVE_DINT32_ARITHMETICS is 1.
a | SIMD integer1 |
b | SIMD integer2 |
|
inlinestatic |
a<=b for SIMD4 float.
a | value1 |
b | value2 |
|
inlinestatic |
a<=b for SIMD4 double.
a | value1 |
b | value2 |
|
inlinestatic |
SIMD a<=b for single SIMD.
a | value1 |
b | value2 |
|
inlinestatic |
SIMD a<=b for double SIMD.
a | value1 |
b | value2 |
|
inlinestatic |
a==b for SIMD4 float
a | value1 |
b | value2 |
|
inlinestatic |
a==b for SIMD4 double
a | value1 |
b | value2 |
|
inlinestatic |
SIMD a==b for single SIMD.
a | value1 |
b | value2 |
Beware that exact floating-point comparisons are difficult.
|
inlinestatic |
SIMD a==b for double SIMD.
a | value1 |
b | value2 |
Beware that exact floating-point comparisons are difficult.
|
inlinestatic |
Equality comparison of two integers corresponding to float values.
Available if GMX_SIMD_HAVE_FINT32_ARITHMETICS is 1.
a | SIMD integer1 |
b | SIMD integer2 |
|
inlinestatic |
Equality comparison of two integers corresponding to double values.
Available if GMX_SIMD_HAVE_DINT32_ARITHMETICS is 1.
a | SIMD integer1 |
b | SIMD integer2 |
|
inlinestatic |
Bitwise xor for two SIMD4 float variables.
Available if GMX_SIMD_HAVE_LOGICAL is 1.
a | data1 |
b | data2 |
|
inlinestatic |
Bitwise xor for two SIMD4 double variables.
Available if GMX_SIMD_HAVE_LOGICAL is 1.
a | data1 |
b | data2 |
|
inlinestatic |
Bitwise xor for SIMD float.
Available if GMX_SIMD_HAVE_LOGICAL is 1.
a | data1 |
b | data2 |
|
inlinestatic |
Bitwise xor for SIMD double.
Available if GMX_SIMD_HAVE_LOGICAL is 1.
a | data1 |
b | data2 |
|
inlinestatic |
Integer SIMD bitwise xor.
Available if GMX_SIMD_HAVE_FINT32_LOGICAL is 1.
a | first integer SIMD |
b | second integer SIMD |
|
inlinestatic |
Integer SIMD bitwise xor.
Available if GMX_SIMD_HAVE_DINT32_LOGICAL is 1.
a | first integer SIMD |
b | second integer SIMD |
|
inlinestatic |
Bitwise or for two SIMD4 doubles.
Available if GMX_SIMD_HAVE_LOGICAL is 1.
a | data1 |
b | data2 |
|
inlinestatic |
Bitwise or for two SIMD4 floats.
Available if GMX_SIMD_HAVE_LOGICAL is 1.
a | data1 |
b | data2 |
|
inlinestatic |
Bitwise or for SIMD float.
Available if GMX_SIMD_HAVE_LOGICAL is 1.
a | data1 |
b | data2 |
|
inlinestatic |
Bitwise or for SIMD double.
Available if GMX_SIMD_HAVE_LOGICAL is 1.
a | data1 |
b | data2 |
|
inlinestatic |
Integer SIMD bitwise or.
Available if GMX_SIMD_HAVE_FINT32_LOGICAL is 1.
a | first integer SIMD |
b | second integer SIMD |
|
inlinestatic |
Integer SIMD bitwise or.
Available if GMX_SIMD_HAVE_DINT32_LOGICAL is 1.
a | first integer SIMD |
b | second integer SIMD |
|
inlinestatic |
Logical or on single precision SIMD4 booleans.
a | logical vars 1 |
b | logical vars 2 |
Note that this is not necessarily a bitwise operation - the storage format of booleans is implementation-dependent.
|
inlinestatic |
Logical or on single precision SIMD4 booleans.
a | logical vars 1 |
b | logical vars 2 |
Note that this is not necessarily a bitwise operation - the storage format of booleans is implementation-dependent.
|
inlinestatic |
Logical or on single precision SIMD booleans.
a | logical vars 1 |
b | logical vars 2 |
Note that this is not necessarily a bitwise operation - the storage format of booleans is implementation-dependent.
\
|
inlinestatic |
Logical or on double precision SIMD booleans.
a | logical vars 1 |
b | logical vars 2 |
Note that this is not necessarily a bitwise operation - the storage format of booleans is implementation-dependent.
\
|
inlinestatic |
Logical OR on SimdFIBool.
Available if GMX_SIMD_HAVE_FINT32_ARITHMETICS is 1.
a | SIMD boolean 1 |
b | SIMD boolean 2 |
|
inlinestatic |
Logical OR on SimdDIBool.
Available if GMX_SIMD_HAVE_DINT32_ARITHMETICS is 1.
a | SIMD boolean 1 |
b | SIMD boolean 2 |
|
inlinestatic |
SIMD float 1.0/x lookup.
This is a low-level instruction that should only be called from routines implementing the reciprocal in simd_math.h.
x | Argument, x!=0 |
|
inlinestatic |
SIMD double 1.0/x lookup.
This is a low-level instruction that should only be called from routines implementing the reciprocal in simd_math.h.
x | Argument, x!=0 |
|
inlinestatic |
Return sum of all elements in SIMD4 float variable.
a | SIMD4 variable to reduce/sum. |
|
inlinestatic |
Return sum of all elements in SIMD4 double variable.
a | SIMD4 variable to reduce/sum. |
|
inlinestatic |
Return sum of all elements in SIMD float variable.
a | SIMD variable to reduce/sum. |
|
inlinestatic |
Return sum of all elements in SIMD double variable.
a | SIMD variable to reduce/sum. |
|
inlinestatic |
Reduce each of four SIMD doubles, add those values to four consecutive doubles in memory, return sum.
m | Pointer to memory where four doubles should be incremented |
v0 | SIMD variable whose sum should be added to m[0] |
v1 | SIMD variable whose sum should be added to m[1] |
v2 | SIMD variable whose sum should be added to m[2] |
v3 | SIMD variable whose sum should be added to m[3] |
The pointer m must be aligned to the smaller of four elements and the floating-point SIMD width.
|
inlinestatic |
Reduce each of four SIMD floats, add those values to four consecutive floats in memory, return sum.
m | Pointer to memory where four floats should be incremented |
v0 | SIMD variable whose sum should be added to m[0] |
v1 | SIMD variable whose sum should be added to m[1] |
v2 | SIMD variable whose sum should be added to m[2] |
v3 | SIMD variable whose sum should be added to m[3] |
The pointer m must be aligned to the smaller of four elements and the floating-point SIMD width.
|
inlinestatic |
Reduce the 4 half-SIMD-with doubles in 2 SIMD variables (sum halves), increment four consecutive doubles in memory, return sum.
m | Pointer to memory where the four values should be incremented |
v0 | Variable whose half-SIMD sums should be added to m[0]/m[1], respectively. |
v1 | Variable whose half-SIMD sums should be added to m[2]/m[3], respectively. |
The pointer m must be aligned, but only to the smaller of four elements and the floating-point SIMD width.
Available if GMX_SIMD_HAVE_HSIMD_UTIL_DOUBLE is 1.
|
inlinestatic |
Reduce the 4 half-SIMD-with floats in 2 SIMD variables (sum halves), increment four consecutive floats in memory, return sum.
m | Pointer to memory where the four values should be incremented |
v0 | Variable whose half-SIMD sums should be added to m[0]/m[1], respectively. |
v1 | Variable whose half-SIMD sums should be added to m[2]/m[3], respectively. |
The pointer m must be aligned, but only to the smaller of four elements and the floating-point SIMD width.
Available if GMX_SIMD_HAVE_HSIMD_UTIL_FLOAT is 1.
|
inlinestatic |
SIMD4 Round to nearest integer value (in floating-point format).
a | Any floating-point value |
|
inlinestatic |
SIMD4 Round to nearest integer value (in floating-point format).
a | Any floating-point value |
|
inlinestatic |
SIMD float round to nearest integer value (in floating-point format).
a | Any floating-point value |
|
inlinestatic |
SIMD double round to nearest integer value (in floating-point format).
a | Any floating-point value |
|
inlinestatic |
SIMD4 1.0/sqrt(x) lookup.
This is a low-level instruction that should only be called from routines implementing the inverse square root in simd_math.h.
x | Argument, x>0 |
|
inlinestatic |
SIMD4 1.0/sqrt(x) lookup.
This is a low-level instruction that should only be called from routines implementing the inverse square root in simd_math.h.
x | Argument, x>0 |
|
inlinestatic |
SIMD float 1.0/sqrt(x) lookup.
This is a low-level instruction that should only be called from routines implementing the inverse square root in simd_math.h.
x | Argument, x>0 |
|
inlinestatic |
double SIMD 1.0/sqrt(x) lookup.
This is a low-level instruction that should only be called from routines implementing the inverse square root in simd_math.h.
x | Argument, x>0 |
|
inlinestatic |
Select from single precision SIMD4 variable where boolean is true.
a | Floating-point variable to select from |
mask | Boolean selector |
|
inlinestatic |
Select from single precision SIMD4 variable where boolean is true.
a | Floating-point variable to select from |
mask | Boolean selector |
|
inlinestatic |
Select from single precision SIMD variable where boolean is true.
a | Floating-point variable to select from |
mask | Boolean selector |
|
inlinestatic |
Select from double precision SIMD variable where boolean is true.
a | Floating-point variable to select from |
mask | Boolean selector |
|
inlinestatic |
Select from gmx::SimdFInt32 variable where boolean is true.
Available if GMX_SIMD_HAVE_FINT32_ARITHMETICS is 1.
a | SIMD integer to select from |
mask | Boolean selector |
|
inlinestatic |
Select from gmx::SimdDInt32 variable where boolean is true.
Available if GMX_SIMD_HAVE_DINT32_ARITHMETICS is 1.
a | SIMD integer to select from |
mask | Boolean selector |
|
inlinestatic |
Select from single precision SIMD4 variable where boolean is false.
a | Floating-point variable to select from |
mask | Boolean selector |
|
inlinestatic |
Select from single precision SIMD4 variable where boolean is false.
a | Floating-point variable to select from |
mask | Boolean selector |
|
inlinestatic |
Select from single precision SIMD variable where boolean is false.
a | Floating-point variable to select from |
mask | Boolean selector |
|
inlinestatic |
Select from double precision SIMD variable where boolean is false.
a | Floating-point variable to select from |
mask | Boolean selector |
|
inlinestatic |
Select from gmx::SimdFInt32 variable where boolean is false.
Available if GMX_SIMD_HAVE_FINT32_ARITHMETICS is 1.
a | SIMD integer to select from |
mask | Boolean selector |
|
inlinestatic |
Select from gmx::SimdDInt32 variable where boolean is false.
Available if GMX_SIMD_HAVE_DINT32_ARITHMETICS is 1.
a | SIMD integer to select from |
mask | Boolean selector |
|
inlinestatic |
Set all SIMD double variable elements to 0.0.
You should typically just call gmx::setZero(), which uses proxy objects internally to handle all types rather than adding the suffix used here.
|
inlinestatic |
Set all SIMD (double) integer variable elements to 0.
You should typically just call gmx::setZero(), which uses proxy objects internally to handle all types rather than adding the suffix used here.
|
inlinestatic |
Set all SIMD float variable elements to 0.0.
You should typically just call gmx::setZero(), which uses proxy objects internally to handle all types rather than adding the suffix used here.
|
inlinestatic |
Set all SIMD (float) integer variable elements to 0.
You should typically just call gmx::setZero(), which uses proxy objects internally to handle all types rather than adding the suffix used here.
|
inlinestatic |
Set all SIMD4 double elements to 0.
You should typically just call gmx::setZero(), which uses proxy objects internally to handle all types rather than adding the suffix used here.
|
inlinestatic |
Set all SIMD4 float elements to 0.
You should typically just call gmx::setZero(), which uses proxy objects internally to handle all types rather than adding the suffix used here.
|
inlinestatic |
Load GMX_SIMD_FLOAT_WIDTH float numbers from aligned memory.
m | Pointer to memory aligned to the SIMD width. |
|
inlinestatic |
Load GMX_SIMD_DOUBLE_WIDTH numbers from aligned memory.
m | Pointer to memory aligned to the SIMD width. |
|
inlinestatic |
Load aligned SIMD integer data, width corresponds to gmx::SimdFloat.
You should typically just call gmx::load(), which uses proxy objects internally to handle all types rather than adding the suffix used here.
m | Pointer to memory, aligned to (float) integer SIMD width. |
|
inlinestatic |
Load aligned SIMD integer data, width corresponds to gmx::SimdDouble.
You should typically just call gmx::load(), which uses proxy objects internally to handle all types rather than adding the suffix used here.
m | Pointer to memory, aligned to (double) integer SIMD width. |
|
inlinestatic |
Load SIMD float from unaligned memory.
Available if GMX_SIMD_HAVE_LOADU is 1.
m | Pointer to memory, no alignment requirement. |
|
inlinestatic |
Load SIMD double from unaligned memory.
Available if GMX_SIMD_HAVE_LOADU is 1.
m | Pointer to memory, no alignment requirement. |
|
inlinestatic |
Load unaligned integer SIMD data, width corresponds to gmx::SimdFloat.
You should typically just call gmx::loadU(), which uses proxy objects internally to handle all types rather than adding the suffix used here.
Available if GMX_SIMD_HAVE_LOADU is 1.
m | Pointer to memory, no alignment requirements. |
|
inlinestatic |
Load unaligned integer SIMD data, width corresponds to gmx::SimdDouble.
You should typically just call gmx::loadU(), which uses proxy objects internally to handle all types rather than adding the suffix used here.
Available if GMX_SIMD_HAVE_LOADU is 1.
m | Pointer to memory, no alignment requirements. |
|
inlinestatic |
Store the contents of SIMD float variable to aligned memory m.
[out] | m | Pointer to memory, aligned to SIMD width. |
a | SIMD variable to store |
|
inlinestatic |
Store the contents of SIMD double variable to aligned memory m.
[out] | m | Pointer to memory, aligned to SIMD width. |
a | SIMD variable to store |
|
inlinestatic |
Store aligned SIMD integer data, width corresponds to gmx::SimdFloat.
m | Memory aligned to (float) integer SIMD width. |
a | SIMD variable to store. |
|
inlinestatic |
Store aligned SIMD integer data, width corresponds to gmx::SimdDouble.
m | Memory aligned to (double) integer SIMD width. |
a | SIMD (double) integer variable to store. |
|
inlinestatic |
Store the contents of SIMD4 double to aligned memory m.
[out] | m | Pointer to memory, aligned to 4 elements. |
a | SIMD4 variable to store |
|
inlinestatic |
Store the contents of SIMD4 float to aligned memory m.
[out] | m | Pointer to memory, aligned to 4 elements. |
a | SIMD4 variable to store |
|
inlinestatic |
Store SIMD4 float to unaligned memory.
Available if GMX_SIMD_HAVE_STOREU is 1.
[out] | m | Pointer to memory, no alignment requirement. |
a | SIMD4 variable to store. |
|
inlinestatic |
Store SIMD4 double to unaligned memory.
Available if GMX_SIMD_HAVE_STOREU is 1.
[out] | m | Pointer to memory, no alignment requirement. |
a | SIMD4 variable to store. |
|
inlinestatic |
Store low & high parts of SIMD double to different locations.
m0 | Pointer to memory aligned to half SIMD width. |
m1 | Pointer to memory aligned to half SIMD width. |
a | SIMD variable. Low half should be stored to m0, high to m1. |
Available if GMX_SIMD_HAVE_HSIMD_UTIL_DOUBLE is 1.
|
inlinestatic |
Store low & high parts of SIMD float to different locations.
m0 | Pointer to memory aligned to half SIMD width. |
m1 | Pointer to memory aligned to half SIMD width. |
a | SIMD variable. Low half should be stored to m0, high to m1. |
Available if GMX_SIMD_HAVE_HSIMD_UTIL_FLOAT is 1.
|
inlinestatic |
Store SIMD float to unaligned memory.
Available if GMX_SIMD_HAVE_STOREU is 1.
[out] | m | Pointer to memory, no alignment requirement. |
a | SIMD variable to store. |
|
inlinestatic |
Store SIMD double to unaligned memory.
Available if GMX_SIMD_HAVE_STOREU is 1.
[out] | m | Pointer to memory, no alignment requirement. |
a | SIMD variable to store. |
|
inlinestatic |
Store unaligned SIMD integer data, width corresponds to gmx::SimdFloat.
Available if GMX_SIMD_HAVE_STOREU is 1.
m | Memory pointer, no alignment requirements. |
a | SIMD variable to store. |
|
inlinestatic |
Store unaligned SIMD integer data, width corresponds to gmx::SimdDouble.
Available if GMX_SIMD_HAVE_STOREU is 1.
m | Memory pointer, no alignment requirements. |
a | SIMD (double) integer variable to store. |
|
inlinestatic |
Return true if any bits are set in the single precision SIMD.
This function is used to handle bitmasks, mainly for exclusions in the inner kernels. Note that it will return true even for -0.0F (sign bit set), so it is not identical to not-equal.
a | value |
|
inlinestatic |
Return true if any bits are set in the single precision SIMD.
This function is used to handle bitmasks, mainly for exclusions in the inner kernels. Note that it will return true even for -0.0 (sign bit set), so it is not identical to not-equal.
a | value |
|
inlinestatic |
Check if any bit is set in each element.
Available if GMX_SIMD_HAVE_FINT32_ARITHMETICS is 1.
a | SIMD integer |
|
inlinestatic |
Check if any bit is set in each element.
Available if GMX_SIMD_HAVE_DINT32_ARITHMETICS is 1.
a | SIMD integer |
|
inlinestatic |
SIMD4 float transpose.
[in,out] | v0 | Row 0 on input, column 0 on output |
[in,out] | v1 | Row 1 on input, column 1 on output |
[in,out] | v2 | Row 2 on input, column 2 on output |
[in,out] | v3 | Row 3 on input, column 3 on output |
|
inlinestatic |
SIMD4 double transpose.
[in,out] | v0 | Row 0 on input, column 0 on output |
[in,out] | v1 | Row 1 on input, column 1 on output |
[in,out] | v2 | Row 2 on input, column 2 on output |
[in,out] | v3 | Row 3 on input, column 3 on output |
|
inlinestatic |
Transpose and subtract 3 SIMD doubles to 3 consecutive addresses at GMX_SIMD_DOUBLE_WIDTH offsets.
align | Alignment of the memory to which we write, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 3 for this routine) the output data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are decremented. |
[out] | base | Pointer to start of memory. |
offset | Aligned array with offsets to the start of each triplet. | |
v0 | 1st component, subtracted from base[align*offset[i]] | |
v1 | 2nd component, subtracted from base[align*offset[i]+1] | |
v2 | 3rd component, subtracted from base[align*offset[i]+2] |
This function can work with both aligned (better performance) and unaligned memory. When the align parameter is not a power-of-two (align==3 would be normal for packed atomic coordinates) the memory obviously cannot be aligned, and we account for this. However, in the case where align is a power-of-two, we assume the base pointer also has the same alignment, which will enable many platforms to use faster aligned memory load/store operations. An easy way to think of this is that each triplet of data in memory must be aligned to the align parameter you specify when it's a power-of-two.
The offset memory must always be aligned to GMX_SIMD_FINT32_WIDTH, since this enables us to use SIMD loads and gather operations on platforms that support it.
|
inlinestatic |
Transpose and subtract 3 SIMD floats to 3 consecutive addresses at GMX_SIMD_FLOAT_WIDTH offsets.
align | Alignment of the memory to which we write, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 3 for this routine) the output data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are decremented. |
[out] | base | Pointer to start of memory. |
offset | Aligned array with offsets to the start of each triplet. | |
v0 | 1st component, subtracted from base[align*offset[i]] | |
v1 | 2nd component, subtracted from base[align*offset[i]+1] | |
v2 | 3rd component, subtracted from base[align*offset[i]+2] |
This function can work with both aligned (better performance) and unaligned memory. When the align parameter is not a power-of-two (align==3 would be normal for packed atomic coordinates) the memory obviously cannot be aligned, and we account for this. However, in the case where align is a power-of-two, we assume the base pointer also has the same alignment, which will enable many platforms to use faster aligned memory load/store operations. An easy way to think of this is that each triplet of data in memory must be aligned to the align parameter you specify when it's a power-of-two.
The offset memory must always be aligned to GMX_SIMD_FINT32_WIDTH, since this enables us to use SIMD loads and gather operations on platforms that support it.
|
inlinestatic |
Transpose and add 3 SIMD doubles to 3 consecutive addresses at GMX_SIMD_DOUBLE_WIDTH offsets.
align | Alignment of the memory to which we write, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 3 for this routine) the output data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are incremented. |
[out] | base | Pointer to the start of the memory area |
offset | Aligned array with offsets to the start of each triplet. | |
v0 | 1st component of triplets, added to base[align*offset[i]]. | |
v1 | 2nd component of triplets, added to base[align*offset[i] + 1]. | |
v2 | 3rd component of triplets, added to base[align*offset[i] + 2]. |
This function can work with both aligned (better performance) and unaligned memory. When the align parameter is not a power-of-two (align==3 would be normal for packed atomic coordinates) the memory obviously cannot be aligned, and we account for this. However, in the case where align is a power-of-two, we assume the base pointer also has the same alignment, which will enable many platforms to use faster aligned memory load/store operations. An easy way to think of this is that each triplet of data in memory must be aligned to the align parameter you specify when it's a power-of-two.
The offset memory must always be aligned to GMX_SIMD_FINT32_WIDTH, since this enables us to use SIMD loads and gather operations on platforms that support it.
|
inlinestatic |
Transpose and add 3 SIMD floats to 3 consecutive addresses at GMX_SIMD_FLOAT_WIDTH offsets.
align | Alignment of the memory to which we write, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 3 for this routine) the output data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are incremented. |
[out] | base | Pointer to the start of the memory area |
offset | Aligned array with offsets to the start of each triplet. | |
v0 | 1st component of triplets, added to base[align*offset[i]]. | |
v1 | 2nd component of triplets, added to base[align*offset[i] + 1]. | |
v2 | 3rd component of triplets, added to base[align*offset[i] + 2]. |
This function can work with both aligned (better performance) and unaligned memory. When the align parameter is not a power-of-two (align==3 would be normal for packed atomic coordinates) the memory obviously cannot be aligned, and we account for this. However, in the case where align is a power-of-two, we assume the base pointer also has the same alignment, which will enable many platforms to use faster aligned memory load/store operations. An easy way to think of this is that each triplet of data in memory must be aligned to the align parameter you specify when it's a power-of-two.
The offset memory must always be aligned to GMX_SIMD_FINT32_WIDTH, since this enables us to use SIMD loads and gather operations on platforms that support it.
|
inlinestatic |
Transpose and store 3 SIMD doubles to 3 consecutive addresses at GMX_SIMD_DOUBLE_WIDTH offsets.
align | Alignment of the memory to which we write, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 3 for this routine) the output data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are written. |
[out] | base | Pointer to the start of the memory area |
offset | Aligned array with offsets to the start of each triplet. | |
v0 | 1st component of triplets, written to base[align*offset[i]]. | |
v1 | 2nd component of triplets, written to base[align*offset[i] + 1]. | |
v2 | 3rd component of triplets, written to base[align*offset[i] + 2]. |
This function can work with both aligned (better performance) and unaligned memory. When the align parameter is not a power-of-two (align==3 would be normal for packed atomic coordinates) the memory obviously cannot be aligned, and we account for this. However, in the case where align is a power-of-two, we assume the base pointer also has the same alignment, which will enable many platforms to use faster aligned memory store operations. An easy way to think of this is that each triplet of data in memory must be aligned to the align parameter you specify when it's a power-of-two.
The offset memory must always be aligned to GMX_SIMD_FINT32_WIDTH, since this enables us to use SIMD loads and gather operations on platforms that support it.
|
inlinestatic |
Transpose and store 3 SIMD floats to 3 consecutive addresses at GMX_SIMD_FLOAT_WIDTH offsets.
align | Alignment of the memory to which we write, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 3 for this routine) the output data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are written. |
[out] | base | Pointer to the start of the memory area |
offset | Aligned array with offsets to the start of each triplet. | |
v0 | 1st component of triplets, written to base[align*offset[i]]. | |
v1 | 2nd component of triplets, written to base[align*offset[i] + 1]. | |
v2 | 3rd component of triplets, written to base[align*offset[i] + 2]. |
This function can work with both aligned (better performance) and unaligned memory. When the align parameter is not a power-of-two (align==3 would be normal for packed atomic coordinates) the memory obviously cannot be aligned, and we account for this. However, in the case where align is a power-of-two, we assume the base pointer also has the same alignment, which will enable many platforms to use faster aligned memory store operations. An easy way to think of this is that each triplet of data in memory must be aligned to the align parameter you specify when it's a power-of-two.
The offset memory must always be aligned to GMX_SIMD_FINT32_WIDTH, since this enables us to use SIMD loads and gather operations on platforms that support it.
|
inlinestatic |
Truncate SIMD4, i.e. round towards zero - common hardware instruction.
a | Any floating-point value |
|
inlinestatic |
Truncate SIMD4, i.e. round towards zero - common hardware instruction.
a | Any floating-point value |
|
inlinestatic |
Truncate SIMD float, i.e. round towards zero - common hardware instruction.
a | Any floating-point value |
|
inlinestatic |
Truncate SIMD double, i.e. round towards zero - common hardware instruction.
a | Any floating-point value |
|
static |
Best alignment to use for aligned pairs of double data.
The routines to load and transpose data will work with a wide range of alignments, but some might be faster than others, depending on the load instructions available in the hardware. This specifies the best alignment for each implementation when working with pairs of data.
To allow each architecture to use the most optimal form, we use a constant that code outside the SIMD module should use to store things properly. It must be at least 2. For example, a value of 2 means the two parameters A & B are stored as [A0 B0 A1 B1] while align-4 means [A0 B0 - - A1 B1 - -].
This alignment depends on the efficiency of partial-register load/store operations, and will depend on the architecture.
|
static |
Best alignment to use for aligned pairs of float data.
The routines to load and transpose data will work with a wide range of alignments, but some might be faster than others, depending on the load instructions available in the hardware. This specifies the best alignment for each implementation when working with pairs of data.
To allow each architecture to use the most optimal form, we use a constant that code outside the SIMD module should use to store things properly. It must be at least 2. For example, a value of 2 means the two parameters A & B are stored as [A0 B0 A1 B1] while align-4 means [A0 B0 - - A1 B1 - -].
This alignment depends on the efficiency of partial-register load/store operations, and will depend on the architecture.