Gromacs  2023
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
Namespaces | Classes | Directories | Files
SIMD intrinsics interface (simd)
+ Collaboration diagram for SIMD intrinsics interface (simd):

Description

Provides an architecture-independent way of doing SIMD coding.

Overview of the SIMD implementation is provided in Single-instruction Multiple-data (SIMD) coding. The details are documented in gromacs/simd/simd.h and the reference implementation impl_reference.h.

Author
Erik Lindahl erik..nosp@m.lind.nosp@m.ahl@s.nosp@m.cili.nosp@m.felab.nosp@m..se

Namespaces

 gmx
 Generic GROMACS namespace.
 

Constant width-4 double precision SIMD types and instructions

static Simd4Double gmx_simdcall gmx::load4 (const double *m)
 Load 4 double values from aligned memory into SIMD4 variable. More...
 
static void gmx_simdcall gmx::store4 (double *m, Simd4Double a)
 Store the contents of SIMD4 double to aligned memory m. More...
 
static Simd4Double gmx_simdcall gmx::load4U (const double *m)
 Load SIMD4 double from unaligned memory. More...
 
static void gmx_simdcall gmx::store4U (double *m, Simd4Double a)
 Store SIMD4 double to unaligned memory. More...
 
static Simd4Double gmx_simdcall gmx::simd4SetZeroD ()
 Set all SIMD4 double elements to 0. More...
 
static Simd4Double gmx_simdcall gmx::operator& (Simd4Double a, Simd4Double b)
 Bitwise and for two SIMD4 double variables. More...
 
static Simd4Double gmx_simdcall gmx::andNot (Simd4Double a, Simd4Double b)
 Bitwise andnot for two SIMD4 double variables. c=(~a) & b. More...
 
static Simd4Double gmx_simdcall gmx::operator| (Simd4Double a, Simd4Double b)
 Bitwise or for two SIMD4 doubles. More...
 
static Simd4Double gmx_simdcall gmx::operator^ (Simd4Double a, Simd4Double b)
 Bitwise xor for two SIMD4 double variables. More...
 
static Simd4Double gmx_simdcall gmx::operator+ (Simd4Double a, Simd4Double b)
 Add two double SIMD4 variables. More...
 
static Simd4Double gmx_simdcall gmx::operator- (Simd4Double a, Simd4Double b)
 Subtract two SIMD4 variables. More...
 
static Simd4Double gmx_simdcall gmx::operator- (Simd4Double a)
 SIMD4 floating-point negate. More...
 
static Simd4Double gmx_simdcall gmx::operator* (Simd4Double a, Simd4Double b)
 Multiply two SIMD4 variables. More...
 
static Simd4Double gmx_simdcall gmx::fma (Simd4Double a, Simd4Double b, Simd4Double c)
 SIMD4 Fused-multiply-add. Result is a*b+c. More...
 
static Simd4Double gmx_simdcall gmx::fms (Simd4Double a, Simd4Double b, Simd4Double c)
 SIMD4 Fused-multiply-subtract. Result is a*b-c. More...
 
static Simd4Double gmx_simdcall gmx::fnma (Simd4Double a, Simd4Double b, Simd4Double c)
 SIMD4 Fused-negated-multiply-add. Result is -a*b+c. More...
 
static Simd4Double gmx_simdcall gmx::fnms (Simd4Double a, Simd4Double b, Simd4Double c)
 SIMD4 Fused-negated-multiply-subtract. Result is -a*b-c. More...
 
static Simd4Double gmx_simdcall gmx::rsqrt (Simd4Double x)
 SIMD4 1.0/sqrt(x) lookup. More...
 
static Simd4Double gmx_simdcall gmx::abs (Simd4Double a)
 SIMD4 Floating-point abs(). More...
 
static Simd4Double gmx_simdcall gmx::max (Simd4Double a, Simd4Double b)
 Set each SIMD4 element to the largest from two variables. More...
 
static Simd4Double gmx_simdcall gmx::min (Simd4Double a, Simd4Double b)
 Set each SIMD4 element to the largest from two variables. More...
 
static Simd4Double gmx_simdcall gmx::round (Simd4Double a)
 SIMD4 Round to nearest integer value (in floating-point format). More...
 
static Simd4Double gmx_simdcall gmx::trunc (Simd4Double a)
 Truncate SIMD4, i.e. round towards zero - common hardware instruction. More...
 
static double gmx_simdcall gmx::dotProduct (Simd4Double a, Simd4Double b)
 Return dot product of two double precision SIMD4 variables. More...
 
static void gmx_simdcall gmx::transpose (Simd4Double *v0, Simd4Double *v1, Simd4Double *v2, Simd4Double *v3)
 SIMD4 double transpose. More...
 
static Simd4DBool gmx_simdcall gmx::operator== (Simd4Double a, Simd4Double b)
 a==b for SIMD4 double More...
 
static Simd4DBool gmx_simdcall gmx::operator!= (Simd4Double a, Simd4Double b)
 a!=b for SIMD4 double More...
 
static Simd4DBool gmx_simdcall gmx::operator< (Simd4Double a, Simd4Double b)
 a<b for SIMD4 double More...
 
static Simd4DBool gmx_simdcall gmx::operator<= (Simd4Double a, Simd4Double b)
 a<=b for SIMD4 double. More...
 
static Simd4DBool gmx_simdcall gmx::operator&& (Simd4DBool a, Simd4DBool b)
 Logical and on single precision SIMD4 booleans. More...
 
static Simd4DBool gmx_simdcall gmx::operator|| (Simd4DBool a, Simd4DBool b)
 Logical or on single precision SIMD4 booleans. More...
 
static bool gmx_simdcall gmx::anyTrue (Simd4DBool a)
 Returns non-zero if any of the boolean in SIMD4 a is True, otherwise 0. More...
 
static Simd4Double gmx_simdcall gmx::selectByMask (Simd4Double a, Simd4DBool mask)
 Select from single precision SIMD4 variable where boolean is true. More...
 
static Simd4Double gmx_simdcall gmx::selectByNotMask (Simd4Double a, Simd4DBool mask)
 Select from single precision SIMD4 variable where boolean is false. More...
 
static Simd4Double gmx_simdcall gmx::blend (Simd4Double a, Simd4Double b, Simd4DBool sel)
 Vector-blend SIMD4 selection. More...
 
static double gmx_simdcall gmx::reduce (Simd4Double a)
 Return sum of all elements in SIMD4 double variable. More...
 

Constant width-4 single precision SIMD types and instructions

static Simd4Float gmx_simdcall gmx::load4 (const float *m)
 Load 4 float values from aligned memory into SIMD4 variable. More...
 
static void gmx_simdcall gmx::store4 (float *m, Simd4Float a)
 Store the contents of SIMD4 float to aligned memory m. More...
 
static Simd4Float gmx_simdcall gmx::load4U (const float *m)
 Load SIMD4 float from unaligned memory. More...
 
static void gmx_simdcall gmx::store4U (float *m, Simd4Float a)
 Store SIMD4 float to unaligned memory. More...
 
static Simd4Float gmx_simdcall gmx::simd4SetZeroF ()
 Set all SIMD4 float elements to 0. More...
 
static Simd4Float gmx_simdcall gmx::operator& (Simd4Float a, Simd4Float b)
 Bitwise and for two SIMD4 float variables. More...
 
static Simd4Float gmx_simdcall gmx::andNot (Simd4Float a, Simd4Float b)
 Bitwise andnot for two SIMD4 float variables. c=(~a) & b. More...
 
static Simd4Float gmx_simdcall gmx::operator| (Simd4Float a, Simd4Float b)
 Bitwise or for two SIMD4 floats. More...
 
static Simd4Float gmx_simdcall gmx::operator^ (Simd4Float a, Simd4Float b)
 Bitwise xor for two SIMD4 float variables. More...
 
static Simd4Float gmx_simdcall gmx::operator+ (Simd4Float a, Simd4Float b)
 Add two float SIMD4 variables. More...
 
static Simd4Float gmx_simdcall gmx::operator- (Simd4Float a, Simd4Float b)
 Subtract two SIMD4 variables. More...
 
static Simd4Float gmx_simdcall gmx::operator- (Simd4Float a)
 SIMD4 floating-point negate. More...
 
static Simd4Float gmx_simdcall gmx::operator* (Simd4Float a, Simd4Float b)
 Multiply two SIMD4 variables. More...
 
static Simd4Float gmx_simdcall gmx::fma (Simd4Float a, Simd4Float b, Simd4Float c)
 SIMD4 Fused-multiply-add. Result is a*b+c. More...
 
static Simd4Float gmx_simdcall gmx::fms (Simd4Float a, Simd4Float b, Simd4Float c)
 SIMD4 Fused-multiply-subtract. Result is a*b-c. More...
 
static Simd4Float gmx_simdcall gmx::fnma (Simd4Float a, Simd4Float b, Simd4Float c)
 SIMD4 Fused-negated-multiply-add. Result is -a*b+c. More...
 
static Simd4Float gmx_simdcall gmx::fnms (Simd4Float a, Simd4Float b, Simd4Float c)
 SIMD4 Fused-negated-multiply-subtract. Result is -a*b-c. More...
 
static Simd4Float gmx_simdcall gmx::rsqrt (Simd4Float x)
 SIMD4 1.0/sqrt(x) lookup. More...
 
static Simd4Float gmx_simdcall gmx::abs (Simd4Float a)
 SIMD4 Floating-point fabs(). More...
 
static Simd4Float gmx_simdcall gmx::max (Simd4Float a, Simd4Float b)
 Set each SIMD4 element to the largest from two variables. More...
 
static Simd4Float gmx_simdcall gmx::min (Simd4Float a, Simd4Float b)
 Set each SIMD4 element to the largest from two variables. More...
 
static Simd4Float gmx_simdcall gmx::round (Simd4Float a)
 SIMD4 Round to nearest integer value (in floating-point format). More...
 
static Simd4Float gmx_simdcall gmx::trunc (Simd4Float a)
 Truncate SIMD4, i.e. round towards zero - common hardware instruction. More...
 
static float gmx_simdcall gmx::dotProduct (Simd4Float a, Simd4Float b)
 Return dot product of two single precision SIMD4 variables. More...
 
static void gmx_simdcall gmx::transpose (Simd4Float *v0, Simd4Float *v1, Simd4Float *v2, Simd4Float *v3)
 SIMD4 float transpose. More...
 
static Simd4FBool gmx_simdcall gmx::operator== (Simd4Float a, Simd4Float b)
 a==b for SIMD4 float More...
 
static Simd4FBool gmx_simdcall gmx::operator!= (Simd4Float a, Simd4Float b)
 a!=b for SIMD4 float More...
 
static Simd4FBool gmx_simdcall gmx::operator< (Simd4Float a, Simd4Float b)
 a<b for SIMD4 float More...
 
static Simd4FBool gmx_simdcall gmx::operator<= (Simd4Float a, Simd4Float b)
 a<=b for SIMD4 float. More...
 
static Simd4FBool gmx_simdcall gmx::operator&& (Simd4FBool a, Simd4FBool b)
 Logical and on single precision SIMD4 booleans. More...
 
static Simd4FBool gmx_simdcall gmx::operator|| (Simd4FBool a, Simd4FBool b)
 Logical or on single precision SIMD4 booleans. More...
 
static bool gmx_simdcall gmx::anyTrue (Simd4FBool a)
 Returns non-zero if any of the boolean in SIMD4 a is True, otherwise 0. More...
 
static Simd4Float gmx_simdcall gmx::selectByMask (Simd4Float a, Simd4FBool mask)
 Select from single precision SIMD4 variable where boolean is true. More...
 
static Simd4Float gmx_simdcall gmx::selectByNotMask (Simd4Float a, Simd4FBool mask)
 Select from single precision SIMD4 variable where boolean is false. More...
 
static Simd4Float gmx_simdcall gmx::blend (Simd4Float a, Simd4Float b, Simd4FBool sel)
 Vector-blend SIMD4 selection. More...
 
static float gmx_simdcall gmx::reduce (Simd4Float a)
 Return sum of all elements in SIMD4 float variable. More...
 

SIMD implementation load/store operations for double precision floating point

static SimdDouble gmx_simdcall gmx::simdLoad (const double *m, SimdDoubleTag={})
 Load GMX_SIMD_DOUBLE_WIDTH numbers from aligned memory. More...
 
static void gmx_simdcall gmx::store (double *m, SimdDouble a)
 Store the contents of SIMD double variable to aligned memory m. More...
 
static SimdDouble gmx_simdcall gmx::simdLoadU (const double *m, SimdDoubleTag={})
 Load SIMD double from unaligned memory. More...
 
static void gmx_simdcall gmx::storeU (double *m, SimdDouble a)
 Store SIMD double to unaligned memory. More...
 
static SimdDouble gmx_simdcall gmx::setZeroD ()
 Set all SIMD double variable elements to 0.0. More...
 

SIMD implementation load/store operations for integers (corresponding to double)

static SimdDInt32 gmx_simdcall gmx::simdLoad (const std::int32_t *m, SimdDInt32Tag)
 Load aligned SIMD integer data, width corresponds to gmx::SimdDouble. More...
 
static void gmx_simdcall gmx::store (std::int32_t *m, SimdDInt32 a)
 Store aligned SIMD integer data, width corresponds to gmx::SimdDouble. More...
 
static SimdDInt32 gmx_simdcall gmx::simdLoadU (const std::int32_t *m, SimdDInt32Tag)
 Load unaligned integer SIMD data, width corresponds to gmx::SimdDouble. More...
 
static void gmx_simdcall gmx::storeU (std::int32_t *m, SimdDInt32 a)
 Store unaligned SIMD integer data, width corresponds to gmx::SimdDouble. More...
 
static SimdDInt32 gmx_simdcall gmx::setZeroDI ()
 Set all SIMD (double) integer variable elements to 0. More...
 
template<int index>
static std::int32_t gmx_simdcall gmx::extract (SimdDInt32 a)
 Extract element with index i from gmx::SimdDInt32. More...
 

SIMD implementation double precision floating-point bitwise logical operations

static SimdDouble gmx_simdcall gmx::operator& (SimdDouble a, SimdDouble b)
 Bitwise and for two SIMD double variables. More...
 
static SimdDouble gmx_simdcall gmx::andNot (SimdDouble a, SimdDouble b)
 Bitwise andnot for SIMD double. More...
 
static SimdDouble gmx_simdcall gmx::operator| (SimdDouble a, SimdDouble b)
 Bitwise or for SIMD double. More...
 
static SimdDouble gmx_simdcall gmx::operator^ (SimdDouble a, SimdDouble b)
 Bitwise xor for SIMD double. More...
 

SIMD implementation double precision floating-point arithmetics

static SimdDouble gmx_simdcall gmx::operator+ (SimdDouble a, SimdDouble b)
 Add two double SIMD variables. More...
 
static SimdDouble gmx_simdcall gmx::operator- (SimdDouble a, SimdDouble b)
 Subtract two double SIMD variables. More...
 
static SimdDouble gmx_simdcall gmx::operator- (SimdDouble a)
 SIMD double precision negate. More...
 
static SimdDouble gmx_simdcall gmx::operator* (SimdDouble a, SimdDouble b)
 Multiply two double SIMD variables. More...
 
static SimdDouble gmx_simdcall gmx::fma (SimdDouble a, SimdDouble b, SimdDouble c)
 SIMD double Fused-multiply-add. Result is a*b+c. More...
 
static SimdDouble gmx_simdcall gmx::fms (SimdDouble a, SimdDouble b, SimdDouble c)
 SIMD double Fused-multiply-subtract. Result is a*b-c. More...
 
static SimdDouble gmx_simdcall gmx::fnma (SimdDouble a, SimdDouble b, SimdDouble c)
 SIMD double Fused-negated-multiply-add. Result is -a*b+c. More...
 
static SimdDouble gmx_simdcall gmx::fnms (SimdDouble a, SimdDouble b, SimdDouble c)
 SIMD double Fused-negated-multiply-subtract. Result is -a*b-c. More...
 
static SimdDouble gmx_simdcall gmx::rsqrt (SimdDouble x)
 double SIMD 1.0/sqrt(x) lookup. More...
 
static SimdDouble gmx_simdcall gmx::rcp (SimdDouble x)
 SIMD double 1.0/x lookup. More...
 
static SimdDouble gmx_simdcall gmx::maskAdd (SimdDouble a, SimdDouble b, SimdDBool m)
 Add two double SIMD variables, masked version. More...
 
static SimdDouble gmx_simdcall gmx::maskzMul (SimdDouble a, SimdDouble b, SimdDBool m)
 Multiply two double SIMD variables, masked version. More...
 
static SimdDouble gmx_simdcall gmx::maskzFma (SimdDouble a, SimdDouble b, SimdDouble c, SimdDBool m)
 SIMD double fused multiply-add, masked version. More...
 
static SimdDouble gmx_simdcall gmx::maskzRsqrt (SimdDouble x, SimdDBool m)
 SIMD double 1.0/sqrt(x) lookup, masked version. More...
 
static SimdDouble gmx_simdcall gmx::maskzRcp (SimdDouble x, SimdDBool m)
 SIMD double 1.0/x lookup, masked version. More...
 
static SimdDouble gmx_simdcall gmx::abs (SimdDouble a)
 SIMD double floating-point fabs(). More...
 
static SimdDouble gmx_simdcall gmx::max (SimdDouble a, SimdDouble b)
 Set each SIMD double element to the largest from two variables. More...
 
static SimdDouble gmx_simdcall gmx::min (SimdDouble a, SimdDouble b)
 Set each SIMD double element to the smallest from two variables. More...
 
static SimdDouble gmx_simdcall gmx::round (SimdDouble a)
 SIMD double round to nearest integer value (in floating-point format). More...
 
static SimdDouble gmx_simdcall gmx::trunc (SimdDouble a)
 Truncate SIMD double, i.e. round towards zero - common hardware instruction. More...
 
template<MathOptimization opt = MathOptimization::Safe>
static SimdDouble gmx_simdcall gmx::frexp (SimdDouble value, SimdDInt32 *exponent)
 Extract (integer) exponent and fraction from double precision SIMD. More...
 
template<MathOptimization opt = MathOptimization::Safe>
static SimdDouble gmx_simdcall gmx::ldexp (SimdDouble value, SimdDInt32 exponent)
 Multiply a SIMD double value by the number 2 raised to an exp power. More...
 
static double gmx_simdcall gmx::reduce (SimdDouble a)
 Return sum of all elements in SIMD double variable. More...
 

SIMD implementation double precision floating-point comparison, boolean, selection.

static SimdDBool gmx_simdcall gmx::operator== (SimdDouble a, SimdDouble b)
 SIMD a==b for double SIMD. More...
 
static SimdDBool gmx_simdcall gmx::operator!= (SimdDouble a, SimdDouble b)
 SIMD a!=b for double SIMD. More...
 
static SimdDBool gmx_simdcall gmx::operator< (SimdDouble a, SimdDouble b)
 SIMD a<b for double SIMD. More...
 
static SimdDBool gmx_simdcall gmx::operator<= (SimdDouble a, SimdDouble b)
 SIMD a<=b for double SIMD. More...
 
static SimdDBool gmx_simdcall gmx::testBits (SimdDouble a)
 Return true if any bits are set in the single precision SIMD. More...
 
static SimdDBool gmx_simdcall gmx::operator&& (SimdDBool a, SimdDBool b)
 Logical and on double precision SIMD booleans. More...
 
static SimdDBool gmx_simdcall gmx::operator|| (SimdDBool a, SimdDBool b)
 Logical or on double precision SIMD booleans. More...
 
static bool gmx_simdcall gmx::anyTrue (SimdDBool a)
 Returns non-zero if any of the boolean in SIMD a is True, otherwise 0. More...
 
static SimdDouble gmx_simdcall gmx::selectByMask (SimdDouble a, SimdDBool mask)
 Select from double precision SIMD variable where boolean is true. More...
 
static SimdDouble gmx_simdcall gmx::selectByNotMask (SimdDouble a, SimdDBool mask)
 Select from double precision SIMD variable where boolean is false. More...
 
static SimdDouble gmx_simdcall gmx::blend (SimdDouble a, SimdDouble b, SimdDBool sel)
 Vector-blend SIMD double selection. More...
 

SIMD implementation integer (corresponding to double) bitwise logical operations

static SimdDInt32 gmx_simdcall gmx::operator& (SimdDInt32 a, SimdDInt32 b)
 Integer SIMD bitwise and. More...
 
static SimdDInt32 gmx_simdcall gmx::andNot (SimdDInt32 a, SimdDInt32 b)
 Integer SIMD bitwise not/complement. More...
 
static SimdDInt32 gmx_simdcall gmx::operator| (SimdDInt32 a, SimdDInt32 b)
 Integer SIMD bitwise or. More...
 
static SimdDInt32 gmx_simdcall gmx::operator^ (SimdDInt32 a, SimdDInt32 b)
 Integer SIMD bitwise xor. More...
 

SIMD implementation integer (corresponding to double) arithmetics

static SimdDInt32 gmx_simdcall gmx::operator+ (SimdDInt32 a, SimdDInt32 b)
 Add SIMD integers. More...
 
static SimdDInt32 gmx_simdcall gmx::operator- (SimdDInt32 a, SimdDInt32 b)
 Subtract SIMD integers. More...
 
static SimdDInt32 gmx_simdcall gmx::operator* (SimdDInt32 a, SimdDInt32 b)
 Multiply SIMD integers. More...
 

SIMD implementation integer (corresponding to double) comparisons, boolean selection

static SimdDIBool gmx_simdcall gmx::operator== (SimdDInt32 a, SimdDInt32 b)
 Equality comparison of two integers corresponding to double values. More...
 
static SimdDIBool gmx_simdcall gmx::operator< (SimdDInt32 a, SimdDInt32 b)
 Less-than comparison of two SIMD integers corresponding to double values. More...
 
static SimdDIBool gmx_simdcall gmx::testBits (SimdDInt32 a)
 Check if any bit is set in each element. More...
 
static SimdDIBool gmx_simdcall gmx::operator&& (SimdDIBool a, SimdDIBool b)
 Logical AND on SimdDIBool. More...
 
static SimdDIBool gmx_simdcall gmx::operator|| (SimdDIBool a, SimdDIBool b)
 Logical OR on SimdDIBool. More...
 
static bool gmx_simdcall gmx::anyTrue (SimdDIBool a)
 Returns true if any of the boolean in x is True, otherwise 0. More...
 
static SimdDInt32 gmx_simdcall gmx::selectByMask (SimdDInt32 a, SimdDIBool mask)
 Select from gmx::SimdDInt32 variable where boolean is true. More...
 
static SimdDInt32 gmx_simdcall gmx::selectByNotMask (SimdDInt32 a, SimdDIBool mask)
 Select from gmx::SimdDInt32 variable where boolean is false. More...
 
static SimdDInt32 gmx_simdcall gmx::blend (SimdDInt32 a, SimdDInt32 b, SimdDIBool sel)
 Vector-blend SIMD integer selection. More...
 

SIMD implementation conversion operations

static SimdDInt32 gmx_simdcall gmx::cvtR2I (SimdDouble a)
 Round double precision floating point to integer. More...
 
static SimdDInt32 gmx_simdcall gmx::cvttR2I (SimdDouble a)
 Truncate double precision floating point to integer. More...
 
static SimdDouble gmx_simdcall gmx::cvtI2R (SimdDInt32 a)
 Convert integer to double precision floating point. More...
 
static SimdDIBool gmx_simdcall gmx::cvtB2IB (SimdDBool a)
 Convert from double precision boolean to corresponding integer boolean. More...
 
static SimdDBool gmx_simdcall gmx::cvtIB2B (SimdDIBool a)
 Convert from integer boolean to corresponding double precision boolean. More...
 
static SimdDouble gmx_simdcall gmx::cvtF2D (SimdFloat gmx_unused f)
 Convert SIMD float to double. More...
 
static SimdFloat gmx_simdcall gmx::cvtD2F (SimdDouble gmx_unused d)
 Convert SIMD double to float. More...
 
static void gmx_simdcall gmx::cvtF2DD (SimdFloat gmx_unused f, SimdDouble gmx_unused *d0, SimdDouble gmx_unused *d1)
 Convert SIMD float to double. More...
 
static SimdFloat gmx_simdcall gmx::cvtDD2F (SimdDouble gmx_unused d0, SimdDouble gmx_unused d1)
 Convert SIMD double to float. More...
 
static SimdFInt32 gmx_simdcall gmx::cvtR2I (SimdFloat a)
 Round single precision floating point to integer. More...
 
static SimdFInt32 gmx_simdcall gmx::cvttR2I (SimdFloat a)
 Truncate single precision floating point to integer. More...
 
static SimdFloat gmx_simdcall gmx::cvtI2R (SimdFInt32 a)
 Convert integer to single precision floating point. More...
 
static SimdFIBool gmx_simdcall gmx::cvtB2IB (SimdFBool a)
 Convert from single precision boolean to corresponding integer boolean. More...
 
static SimdFBool gmx_simdcall gmx::cvtIB2B (SimdFIBool a)
 Convert from integer boolean to corresponding single precision boolean. More...
 

SIMD implementation load/store operations for single precision floating point

static SimdFloat gmx_simdcall gmx::simdLoad (const float *m, SimdFloatTag={})
 Load GMX_SIMD_FLOAT_WIDTH float numbers from aligned memory. More...
 
static void gmx_simdcall gmx::store (float *m, SimdFloat a)
 Store the contents of SIMD float variable to aligned memory m. More...
 
static SimdFloat gmx_simdcall gmx::simdLoadU (const float *m, SimdFloatTag={})
 Load SIMD float from unaligned memory. More...
 
static void gmx_simdcall gmx::storeU (float *m, SimdFloat a)
 Store SIMD float to unaligned memory. More...
 
static SimdFloat gmx_simdcall gmx::setZeroF ()
 Set all SIMD float variable elements to 0.0. More...
 

SIMD implementation load/store operations for integers (corresponding to float)

static SimdFInt32 gmx_simdcall gmx::simdLoad (const std::int32_t *m, SimdFInt32Tag)
 Load aligned SIMD integer data, width corresponds to gmx::SimdFloat. More...
 
static void gmx_simdcall gmx::store (std::int32_t *m, SimdFInt32 a)
 Store aligned SIMD integer data, width corresponds to gmx::SimdFloat. More...
 
static SimdFInt32 gmx_simdcall gmx::simdLoadU (const std::int32_t *m, SimdFInt32Tag)
 Load unaligned integer SIMD data, width corresponds to gmx::SimdFloat. More...
 
static void gmx_simdcall gmx::storeU (std::int32_t *m, SimdFInt32 a)
 Store unaligned SIMD integer data, width corresponds to gmx::SimdFloat. More...
 
static SimdFInt32 gmx_simdcall gmx::setZeroFI ()
 Set all SIMD (float) integer variable elements to 0. More...
 
template<int index>
static std::int32_t gmx_simdcall gmx::extract (SimdFInt32 a)
 Extract element with index i from gmx::SimdFInt32. More...
 

SIMD implementation single precision floating-point bitwise logical operations

static SimdFloat gmx_simdcall gmx::operator& (SimdFloat a, SimdFloat b)
 Bitwise and for two SIMD float variables. More...
 
static SimdFloat gmx_simdcall gmx::andNot (SimdFloat a, SimdFloat b)
 Bitwise andnot for SIMD float. More...
 
static SimdFloat gmx_simdcall gmx::operator| (SimdFloat a, SimdFloat b)
 Bitwise or for SIMD float. More...
 
static SimdFloat gmx_simdcall gmx::operator^ (SimdFloat a, SimdFloat b)
 Bitwise xor for SIMD float. More...
 

SIMD implementation single precision floating-point arithmetics

static SimdFloat gmx_simdcall gmx::operator+ (SimdFloat a, SimdFloat b)
 Add two float SIMD variables. More...
 
static SimdFloat gmx_simdcall gmx::operator- (SimdFloat a, SimdFloat b)
 Subtract two float SIMD variables. More...
 
static SimdFloat gmx_simdcall gmx::operator- (SimdFloat a)
 SIMD single precision negate. More...
 
static SimdFloat gmx_simdcall gmx::operator* (SimdFloat a, SimdFloat b)
 Multiply two float SIMD variables. More...
 
static SimdFloat gmx_simdcall gmx::fma (SimdFloat a, SimdFloat b, SimdFloat c)
 SIMD float Fused-multiply-add. Result is a*b+c. More...
 
static SimdFloat gmx_simdcall gmx::fms (SimdFloat a, SimdFloat b, SimdFloat c)
 SIMD float Fused-multiply-subtract. Result is a*b-c. More...
 
static SimdFloat gmx_simdcall gmx::fnma (SimdFloat a, SimdFloat b, SimdFloat c)
 SIMD float Fused-negated-multiply-add. Result is -a*b+c. More...
 
static SimdFloat gmx_simdcall gmx::fnms (SimdFloat a, SimdFloat b, SimdFloat c)
 SIMD float Fused-negated-multiply-subtract. Result is -a*b-c. More...
 
static SimdFloat gmx_simdcall gmx::rsqrt (SimdFloat x)
 SIMD float 1.0/sqrt(x) lookup. More...
 
static SimdFloat gmx_simdcall gmx::rcp (SimdFloat x)
 SIMD float 1.0/x lookup. More...
 
static SimdFloat gmx_simdcall gmx::maskAdd (SimdFloat a, SimdFloat b, SimdFBool m)
 Add two float SIMD variables, masked version. More...
 
static SimdFloat gmx_simdcall gmx::maskzMul (SimdFloat a, SimdFloat b, SimdFBool m)
 Multiply two float SIMD variables, masked version. More...
 
static SimdFloat gmx_simdcall gmx::maskzFma (SimdFloat a, SimdFloat b, SimdFloat c, SimdFBool m)
 SIMD float fused multiply-add, masked version. More...
 
static SimdFloat gmx_simdcall gmx::maskzRsqrt (SimdFloat x, SimdFBool m)
 SIMD float 1.0/sqrt(x) lookup, masked version. More...
 
static SimdFloat gmx_simdcall gmx::maskzRcp (SimdFloat x, SimdFBool m)
 SIMD float 1.0/x lookup, masked version. More...
 
static SimdFloat gmx_simdcall gmx::abs (SimdFloat a)
 SIMD float Floating-point abs(). More...
 
static SimdFloat gmx_simdcall gmx::max (SimdFloat a, SimdFloat b)
 Set each SIMD float element to the largest from two variables. More...
 
static SimdFloat gmx_simdcall gmx::min (SimdFloat a, SimdFloat b)
 Set each SIMD float element to the smallest from two variables. More...
 
static SimdFloat gmx_simdcall gmx::round (SimdFloat a)
 SIMD float round to nearest integer value (in floating-point format). More...
 
static SimdFloat gmx_simdcall gmx::trunc (SimdFloat a)
 Truncate SIMD float, i.e. round towards zero - common hardware instruction. More...
 
template<MathOptimization opt = MathOptimization::Safe>
static SimdFloat gmx_simdcall gmx::frexp (SimdFloat value, SimdFInt32 *exponent)
 Extract (integer) exponent and fraction from single precision SIMD. More...
 
template<MathOptimization opt = MathOptimization::Safe>
static SimdFloat gmx_simdcall gmx::ldexp (SimdFloat value, SimdFInt32 exponent)
 Multiply a SIMD float value by the number 2 raised to an exp power. More...
 
static float gmx_simdcall gmx::reduce (SimdFloat a)
 Return sum of all elements in SIMD float variable. More...
 

SIMD implementation single precision floating-point comparisons, boolean, selection.

static SimdFBool gmx_simdcall gmx::operator== (SimdFloat a, SimdFloat b)
 SIMD a==b for single SIMD. More...
 
static SimdFBool gmx_simdcall gmx::operator!= (SimdFloat a, SimdFloat b)
 SIMD a!=b for single SIMD. More...
 
static SimdFBool gmx_simdcall gmx::operator< (SimdFloat a, SimdFloat b)
 SIMD a<b for single SIMD. More...
 
static SimdFBool gmx_simdcall gmx::operator<= (SimdFloat a, SimdFloat b)
 SIMD a<=b for single SIMD. More...
 
static SimdFBool gmx_simdcall gmx::testBits (SimdFloat a)
 Return true if any bits are set in the single precision SIMD. More...
 
static SimdFBool gmx_simdcall gmx::operator&& (SimdFBool a, SimdFBool b)
 Logical and on single precision SIMD booleans. More...
 
static SimdFBool gmx_simdcall gmx::operator|| (SimdFBool a, SimdFBool b)
 Logical or on single precision SIMD booleans. More...
 
static bool gmx_simdcall gmx::anyTrue (SimdFBool a)
 Returns non-zero if any of the boolean in SIMD a is True, otherwise 0. More...
 
static SimdFloat gmx_simdcall gmx::selectByMask (SimdFloat a, SimdFBool mask)
 Select from single precision SIMD variable where boolean is true. More...
 
static SimdFloat gmx_simdcall gmx::selectByNotMask (SimdFloat a, SimdFBool mask)
 Select from single precision SIMD variable where boolean is false. More...
 
static SimdFloat gmx_simdcall gmx::blend (SimdFloat a, SimdFloat b, SimdFBool sel)
 Vector-blend SIMD float selection. More...
 

SIMD implementation integer (corresponding to float) bitwise logical operations

static SimdFInt32 gmx_simdcall gmx::operator& (SimdFInt32 a, SimdFInt32 b)
 Integer SIMD bitwise and. More...
 
static SimdFInt32 gmx_simdcall gmx::andNot (SimdFInt32 a, SimdFInt32 b)
 Integer SIMD bitwise not/complement. More...
 
static SimdFInt32 gmx_simdcall gmx::operator| (SimdFInt32 a, SimdFInt32 b)
 Integer SIMD bitwise or. More...
 
static SimdFInt32 gmx_simdcall gmx::operator^ (SimdFInt32 a, SimdFInt32 b)
 Integer SIMD bitwise xor. More...
 

SIMD implementation integer (corresponding to float) arithmetics

static SimdFInt32 gmx_simdcall gmx::operator+ (SimdFInt32 a, SimdFInt32 b)
 Add SIMD integers. More...
 
static SimdFInt32 gmx_simdcall gmx::operator- (SimdFInt32 a, SimdFInt32 b)
 Subtract SIMD integers. More...
 
static SimdFInt32 gmx_simdcall gmx::operator* (SimdFInt32 a, SimdFInt32 b)
 Multiply SIMD integers. More...
 

SIMD implementation integer (corresponding to float) comparisons, boolean, selection

static SimdFIBool gmx_simdcall gmx::operator== (SimdFInt32 a, SimdFInt32 b)
 Equality comparison of two integers corresponding to float values. More...
 
static SimdFIBool gmx_simdcall gmx::operator< (SimdFInt32 a, SimdFInt32 b)
 Less-than comparison of two SIMD integers corresponding to float values. More...
 
static SimdFIBool gmx_simdcall gmx::testBits (SimdFInt32 a)
 Check if any bit is set in each element. More...
 
static SimdFIBool gmx_simdcall gmx::operator&& (SimdFIBool a, SimdFIBool b)
 Logical AND on SimdFIBool. More...
 
static SimdFIBool gmx_simdcall gmx::operator|| (SimdFIBool a, SimdFIBool b)
 Logical OR on SimdFIBool. More...
 
static bool gmx_simdcall gmx::anyTrue (SimdFIBool a)
 Returns true if any of the boolean in x is True, otherwise 0. More...
 
static SimdFInt32 gmx_simdcall gmx::selectByMask (SimdFInt32 a, SimdFIBool mask)
 Select from gmx::SimdFInt32 variable where boolean is true. More...
 
static SimdFInt32 gmx_simdcall gmx::selectByNotMask (SimdFInt32 a, SimdFIBool mask)
 Select from gmx::SimdFInt32 variable where boolean is false. More...
 
static SimdFInt32 gmx_simdcall gmx::blend (SimdFInt32 a, SimdFInt32 b, SimdFIBool sel)
 Vector-blend SIMD integer selection. More...
 

Higher-level SIMD utility functions, double precision.

These include generic functions to work with triplets of data, typically coordinates, and a few utility functions to load and update data in the nonbonded kernels. These functions should be available on all implementations.

static const int gmx::c_simdBestPairAlignmentDouble = 2
 Best alignment to use for aligned pairs of double data. More...
 
template<int align>
static void gmx_simdcall gmx::gatherLoadTranspose (const double *base, const std::int32_t offset[], SimdDouble *v0, SimdDouble *v1, SimdDouble *v2, SimdDouble *v3)
 Load 4 consecutive double from each of GMX_SIMD_DOUBLE_WIDTH offsets, and transpose into 4 SIMD double variables. More...
 
template<int align>
static void gmx_simdcall gmx::gatherLoadTranspose (const double *base, const std::int32_t offset[], SimdDouble *v0, SimdDouble *v1)
 Load 2 consecutive double from each of GMX_SIMD_DOUBLE_WIDTH offsets, and transpose into 2 SIMD double variables. More...
 
template<int align>
static void gmx_simdcall gmx::gatherLoadUTranspose (const double *base, const std::int32_t offset[], SimdDouble *v0, SimdDouble *v1, SimdDouble *v2)
 Load 3 consecutive doubles from each of GMX_SIMD_DOUBLE_WIDTH offsets, and transpose into 3 SIMD double variables. More...
 
template<int align>
static void gmx_simdcall gmx::transposeScatterStoreU (double *base, const std::int32_t offset[], SimdDouble v0, SimdDouble v1, SimdDouble v2)
 Transpose and store 3 SIMD doubles to 3 consecutive addresses at GMX_SIMD_DOUBLE_WIDTH offsets. More...
 
template<int align>
static void gmx_simdcall gmx::transposeScatterIncrU (double *base, const std::int32_t offset[], SimdDouble v0, SimdDouble v1, SimdDouble v2)
 Transpose and add 3 SIMD doubles to 3 consecutive addresses at GMX_SIMD_DOUBLE_WIDTH offsets. More...
 
template<int align>
static void gmx_simdcall gmx::transposeScatterDecrU (double *base, const std::int32_t offset[], SimdDouble v0, SimdDouble v1, SimdDouble v2)
 Transpose and subtract 3 SIMD doubles to 3 consecutive addresses at GMX_SIMD_DOUBLE_WIDTH offsets. More...
 
static void gmx_simdcall gmx::expandScalarsToTriplets (SimdDouble scalar, SimdDouble *triplets0, SimdDouble *triplets1, SimdDouble *triplets2)
 Expand each element of double SIMD variable into three identical consecutive elements in three SIMD outputs. More...
 
template<int align>
static void gmx_simdcall gmx::gatherLoadBySimdIntTranspose (const double *base, SimdDInt32 offset, SimdDouble *v0, SimdDouble *v1, SimdDouble *v2, SimdDouble *v3)
 Load 4 consecutive doubles from each of GMX_SIMD_DOUBLE_WIDTH offsets specified by a SIMD integer, transpose into 4 SIMD double variables. More...
 
template<int align>
static void gmx_simdcall gmx::gatherLoadUBySimdIntTranspose (const double *base, SimdDInt32 offset, SimdDouble *v0, SimdDouble *v1)
 Load 2 consecutive doubles from each of GMX_SIMD_DOUBLE_WIDTH offsets (unaligned) specified by SIMD integer, transpose into 2 SIMD doubles. More...
 
template<int align>
static void gmx_simdcall gmx::gatherLoadBySimdIntTranspose (const double *base, SimdDInt32 offset, SimdDouble *v0, SimdDouble *v1)
 Load 2 consecutive doubles from each of GMX_SIMD_DOUBLE_WIDTH offsets specified by a SIMD integer, transpose into 2 SIMD double variables. More...
 
static double gmx_simdcall gmx::reduceIncr4ReturnSum (double *m, SimdDouble v0, SimdDouble v1, SimdDouble v2, SimdDouble v3)
 Reduce each of four SIMD doubles, add those values to four consecutive doubles in memory, return sum. More...
 

Higher-level SIMD utilities accessing partial (half-width) SIMD doubles.

See the single-precision versions for documentation. Since double precision is typically half the width of single, this double version is likely only useful with 512-bit and larger implementations.

static SimdDouble gmx_simdcall gmx::loadDualHsimd (const double *m0, const double *m1)
 Load low & high parts of SIMD double from different locations. More...
 
static SimdDouble gmx_simdcall gmx::loadDuplicateHsimd (const double *m)
 Load half-SIMD-width double data, spread to both halves. More...
 
static SimdDouble gmx_simdcall gmx::loadU1DualHsimd (const double *m)
 Load two doubles, spread 1st in low half, 2nd in high half. More...
 
static void gmx_simdcall gmx::storeDualHsimd (double *m0, double *m1, SimdDouble a)
 Store low & high parts of SIMD double to different locations. More...
 
static void gmx_simdcall gmx::incrDualHsimd (double *m0, double *m1, SimdDouble a)
 Add each half of SIMD variable to separate memory adresses. More...
 
static void gmx_simdcall gmx::decr3Hsimd (double *m, SimdDouble a0, SimdDouble a1, SimdDouble a2)
 Add the two halves of three SIMD doubles, subtract the sum from three half-SIMD-width consecutive doubles in memory. More...
 
template<int align>
static void gmx_simdcall gmx::gatherLoadTransposeHsimd (const double *base0, const double *base1, std::int32_t offset[], SimdDouble *v0, SimdDouble *v1)
 Load 2 consecutive doubles from each of GMX_SIMD_DOUBLE_WIDTH/2 offsets, transpose into SIMD double (low half from base0, high from base1). More...
 
static double gmx_simdcall gmx::reduceIncr4ReturnSumHsimd (double *m, SimdDouble v0, SimdDouble v1)
 Reduce the 4 half-SIMD-with doubles in 2 SIMD variables (sum halves), increment four consecutive doubles in memory, return sum. More...
 
static SimdDouble gmx_simdcall gmx::loadUNDuplicate4 (const double *m)
 Load N doubles and duplicate them 4 times each. More...
 
static SimdDouble gmx_simdcall gmx::load4DuplicateN (const double *m)
 Load 4 doubles and duplicate them N times each. More...
 
static SimdDouble gmx_simdcall gmx::loadU4NOffset (const double *m, int offset)
 Load doubles in blocks of 4 at fixed offsets. More...
 

Higher-level SIMD utility functions, single precision.

These include generic functions to work with triplets of data, typically coordinates, and a few utility functions to load and update data in the nonbonded kernels. These functions should be available on all implementations, although some wide SIMD implementations (width>=8) also provide special optional versions to work with half or quarter registers to improve the performance in the nonbonded kernels.

static const int gmx::c_simdBestPairAlignmentFloat = 2
 Best alignment to use for aligned pairs of float data. More...
 
template<int align>
static void gmx_simdcall gmx::gatherLoadTranspose (const float *base, const std::int32_t offset[], SimdFloat *v0, SimdFloat *v1, SimdFloat *v2, SimdFloat *v3)
 Load 4 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets, and transpose into 4 SIMD float variables. More...
 
template<int align>
static void gmx_simdcall gmx::gatherLoadTranspose (const float *base, const std::int32_t offset[], SimdFloat *v0, SimdFloat *v1)
 Load 2 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets, and transpose into 2 SIMD float variables. More...
 
template<int align>
static void gmx_simdcall gmx::gatherLoadUTranspose (const float *base, const std::int32_t offset[], SimdFloat *v0, SimdFloat *v1, SimdFloat *v2)
 Load 3 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets, and transpose into 3 SIMD float variables. More...
 
template<int align>
static void gmx_simdcall gmx::transposeScatterStoreU (float *base, const std::int32_t offset[], SimdFloat v0, SimdFloat v1, SimdFloat v2)
 Transpose and store 3 SIMD floats to 3 consecutive addresses at GMX_SIMD_FLOAT_WIDTH offsets. More...
 
template<int align>
static void gmx_simdcall gmx::transposeScatterIncrU (float *base, const std::int32_t offset[], SimdFloat v0, SimdFloat v1, SimdFloat v2)
 Transpose and add 3 SIMD floats to 3 consecutive addresses at GMX_SIMD_FLOAT_WIDTH offsets. More...
 
template<int align>
static void gmx_simdcall gmx::transposeScatterDecrU (float *base, const std::int32_t offset[], SimdFloat v0, SimdFloat v1, SimdFloat v2)
 Transpose and subtract 3 SIMD floats to 3 consecutive addresses at GMX_SIMD_FLOAT_WIDTH offsets. More...
 
static void gmx_simdcall gmx::expandScalarsToTriplets (SimdFloat scalar, SimdFloat *triplets0, SimdFloat *triplets1, SimdFloat *triplets2)
 Expand each element of float SIMD variable into three identical consecutive elements in three SIMD outputs. More...
 
template<int align>
static void gmx_simdcall gmx::gatherLoadBySimdIntTranspose (const float *base, SimdFInt32 offset, SimdFloat *v0, SimdFloat *v1, SimdFloat *v2, SimdFloat *v3)
 Load 4 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets specified by a SIMD integer, transpose into 4 SIMD float variables. More...
 
template<int align>
static void gmx_simdcall gmx::gatherLoadUBySimdIntTranspose (const float *base, SimdFInt32 offset, SimdFloat *v0, SimdFloat *v1)
 Load 2 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets (unaligned) specified by SIMD integer, transpose into 2 SIMD floats. More...
 
template<int align>
static void gmx_simdcall gmx::gatherLoadBySimdIntTranspose (const float *base, SimdFInt32 offset, SimdFloat *v0, SimdFloat *v1)
 Load 2 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets specified by a SIMD integer, transpose into 2 SIMD float variables. More...
 
static float gmx_simdcall gmx::reduceIncr4ReturnSum (float *m, SimdFloat v0, SimdFloat v1, SimdFloat v2, SimdFloat v3)
 Reduce each of four SIMD floats, add those values to four consecutive floats in memory, return sum. More...
 

Higher-level SIMD utilities accessing partial (half-width) SIMD floats.

These functions are optional. The are only useful for SIMD implementation where the width is 8 or larger, and where it would be inefficient to process 4*8, 8*8, or more, interactions in parallel.

Currently, only Intel provides very wide SIMD implementations, but these also come with excellent support for loading, storing, accessing and shuffling parts of the register in so-called 'lanes' of 4 bytes each. We can use this to load separate parts into the low/high halves of the register in the inner loop of the nonbonded kernel, which e.g. makes it possible to process 4*4 nonbonded interactions as a pattern of 2*8. We can also use implementations with width 16 or greater.

To make this more generic, when GMX_SIMD_HAVE_HSIMD_UTIL_REAL is 1, the SIMD implementation provides seven special routines that:

  • Load the low/high parts of a SIMD variable from different pointers
  • Load half the SIMD width from one pointer, and duplicate in low/high parts
  • Load two reals, put 1st one in all low elements, and 2nd in all high ones.
  • Store the low/high parts of a SIMD variable to different pointers
  • Subtract both SIMD halves from a single half-SIMD-width memory location.
  • Load aligned pairs (LJ parameters) from two base pointers, with a common offset list, and put these in the low/high SIMD halves.
  • Reduce each half of two SIMD registers (i.e., 4 parts in total), increment four adjacent memory positions, and return the total sum.

Remember: this is ONLY used when the native SIMD width is large. You will just waste time if you implement it for normal 16-byte SIMD architectures.

This is part of the new C++ SIMD interface, so these functions are only available when using C++. Since some Gromacs code reliying on the SIMD module is still C (not C++), we have kept the C-style naming for now - this will change once we are entirely C++.

static SimdFloat gmx_simdcall gmx::loadDualHsimd (const float *m0, const float *m1)
 Load low & high parts of SIMD float from different locations. More...
 
static SimdFloat gmx_simdcall gmx::loadDuplicateHsimd (const float *m)
 Load half-SIMD-width float data, spread to both halves. More...
 
static SimdFloat gmx_simdcall gmx::loadU1DualHsimd (const float *m)
 Load two floats, spread 1st in low half, 2nd in high half. More...
 
static void gmx_simdcall gmx::storeDualHsimd (float *m0, float *m1, SimdFloat a)
 Store low & high parts of SIMD float to different locations. More...
 
static void gmx_simdcall gmx::incrDualHsimd (float *m0, float *m1, SimdFloat a)
 Add each half of SIMD variable to separate memory adresses. More...
 
static void gmx_simdcall gmx::decr3Hsimd (float *m, SimdFloat a0, SimdFloat a1, SimdFloat a2)
 Add the two halves of three SIMD floats, subtract the sum from three half-SIMD-width consecutive floats in memory. More...
 
template<int align>
static void gmx_simdcall gmx::gatherLoadTransposeHsimd (const float *base0, const float *base1, const std::int32_t offset[], SimdFloat *v0, SimdFloat *v1)
 Load 2 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH/2 offsets, transpose into SIMD float (low half from base0, high from base1). More...
 
static float gmx_simdcall gmx::reduceIncr4ReturnSumHsimd (float *m, SimdFloat v0, SimdFloat v1)
 Reduce the 4 half-SIMD-with floats in 2 SIMD variables (sum halves), increment four consecutive floats in memory, return sum. More...
 
static SimdFloat gmx_simdcall gmx::loadUNDuplicate4 (const float *m)
 Load N floats and duplicate them 4 times each. More...
 
static SimdFloat gmx_simdcall gmx::load4DuplicateN (const float *m)
 Load 4 floats and duplicate them N times each. More...
 
static SimdFloat gmx_simdcall gmx::loadU4NOffset (const float *m, int offset)
 Load floats in blocks of 4 at fixed offsets. More...
 

SIMD predefined macros to describe high-level capabilities

These macros are used to describe the features available in default Gromacs real precision. They are set from the lower-level implementation files that have macros describing single and double precision individually, as well as the implementation details.

#define GMX_SIMD_HAVE_REAL   GMX_SIMD_HAVE_FLOAT
 1 if SimdReal is available, otherwise 0. More...
 
#define GMX_SIMD_REAL_WIDTH   GMX_SIMD_FLOAT_WIDTH
 Width of SimdReal. More...
 
#define GMX_SIMD_HAVE_INT32_EXTRACT   GMX_SIMD_HAVE_FINT32_EXTRACT
 1 if support is available for extracting elements from SimdInt32, otherwise 0 More...
 
#define GMX_SIMD_HAVE_INT32_LOGICAL   GMX_SIMD_HAVE_FINT32_LOGICAL
 1 if logical ops are supported on SimdInt32, otherwise 0. More...
 
#define GMX_SIMD_HAVE_INT32_ARITHMETICS   GMX_SIMD_HAVE_FINT32_ARITHMETICS
 1 if arithmetic ops are supported on SimdInt32, otherwise 0. More...
 
#define GMX_SIMD_HAVE_GATHER_LOADU_BYSIMDINT_TRANSPOSE_REAL   GMX_SIMD_HAVE_GATHER_LOADU_BYSIMDINT_TRANSPOSE_FLOAT
 1 if gmx::simdGatherLoadUBySimdIntTranspose is present, otherwise 0 More...
 
#define GMX_SIMD_HAVE_HSIMD_UTIL_REAL   GMX_SIMD_HAVE_HSIMD_UTIL_FLOAT
 1 if real half-register load/store/reduce utils present, otherwise 0 More...
 
#define GMX_SIMD4_HAVE_REAL   GMX_SIMD4_HAVE_FLOAT
 1 if Simd4Real is available, otherwise 0. More...
 

Classes

class  gmx::Simd4Double
 SIMD4 double type. More...
 
class  gmx::Simd4DBool
 SIMD4 variable type to use for logical comparisons on doubles. More...
 
class  gmx::Simd4Float
 SIMD4 float type. More...
 
class  gmx::Simd4FBool
 SIMD4 variable type to use for logical comparisons on floats. More...
 
class  gmx::SimdDouble
 Double SIMD variable. Available if GMX_SIMD_HAVE_DOUBLE is 1. More...
 
class  gmx::SimdDInt32
 Integer SIMD variable type to use for conversions to/from double. More...
 
class  gmx::SimdDBool
 Boolean type for double SIMD data. More...
 
class  gmx::SimdDIBool
 Boolean type for integer datatypes corresponding to double SIMD. More...
 
class  gmx::SimdFloat
 Float SIMD variable. Available if GMX_SIMD_HAVE_FLOAT is 1. More...
 
class  gmx::SimdFInt32
 Integer SIMD variable type to use for conversions to/from float. More...
 
class  gmx::SimdFBool
 Boolean type for float SIMD data. More...
 
class  gmx::SimdFIBool
 Boolean type for integer datatypes corresponding to float SIMD. More...
 

Directories

directory simd
 SIMD intrinsics interface (simd)
 
directory tests
 Unit tests for SIMD intrinsics interface (simd).
 

Files

file  simd_support.h
 Functions to query compiled and supported SIMD architectures.
 
file  hsimd_declarations.h
 Declares all Hsimd functions that are not supported.
 
file  impl_reference.h
 Reference SIMD implementation, including SIMD documentation.
 
file  impl_reference_definitions.h
 Reference SIMD implementation, including SIMD documentation.
 
file  impl_reference_general.h
 Reference SIMD implementation, general utility functions.
 
file  impl_reference_simd4_double.h
 Reference implementation, SIMD4 single precision.
 
file  impl_reference_simd4_float.h
 Reference implementation, SIMD4 single precision.
 
file  impl_reference_simd_double.h
 Reference implementation, SIMD double precision.
 
file  impl_reference_simd_float.h
 Reference implementation, SIMD single precision.
 
file  impl_reference_util_double.h
 Reference impl., higher-level double prec. SIMD utility functions.
 
file  impl_reference_util_float.h
 Reference impl., higher-level single prec. SIMD utility functions.
 
file  scalar.h
 Scalar float functions corresponding to GROMACS SIMD functions.
 
file  scalar_math.h
 Scalar math functions mimicking GROMACS SIMD math functions.
 
file  scalar_util.h
 Scalar utility functions mimicking GROMACS SIMD utility functions.
 
file  simd.h
 Definitions, capabilities, and wrappers for SIMD module.
 
file  simd_math.h
 Math functions for SIMD datatypes.
 
file  simd_memory.h
 Declares SimdArrayRef.
 
file  vector_operations.h
 SIMD operations corresponding to Gromacs rvec datatypes.
 

Macro Definition Documentation

#define GMX_SIMD4_HAVE_REAL   GMX_SIMD4_HAVE_FLOAT

1 if Simd4Real is available, otherwise 0.

GMX_SIMD4_HAVE_DOUBLE if GMX_DOUBLE is 1, otherwise GMX_SIMD4_HAVE_FLOAT.

#define GMX_SIMD_HAVE_GATHER_LOADU_BYSIMDINT_TRANSPOSE_REAL   GMX_SIMD_HAVE_GATHER_LOADU_BYSIMDINT_TRANSPOSE_FLOAT

1 if gmx::simdGatherLoadUBySimdIntTranspose is present, otherwise 0

GMX_SIMD_HAVE_GATHER_LOADU_BYSIMDINT_TRANSPOSE_DOUBLE if GMX_DOUBLE is 1, otherwise GMX_SIMD_HAVE_GATHER_LOADU_BYSIMDINT_TRANSPOSE_FLOAT.

#define GMX_SIMD_HAVE_HSIMD_UTIL_REAL   GMX_SIMD_HAVE_HSIMD_UTIL_FLOAT

1 if real half-register load/store/reduce utils present, otherwise 0

GMX_SIMD_HAVE_HSIMD_UTIL_DOUBLE if GMX_DOUBLE is 1, otherwise GMX_SIMD_HAVE_HSIMD_UTIL_FLOAT.

#define GMX_SIMD_HAVE_INT32_ARITHMETICS   GMX_SIMD_HAVE_FINT32_ARITHMETICS

1 if arithmetic ops are supported on SimdInt32, otherwise 0.

GMX_SIMD_HAVE_DINT32_ARITHMETICS if GMX_DOUBLE is 1, otherwise GMX_SIMD_HAVE_FINT32_ARITHMETICS.

#define GMX_SIMD_HAVE_INT32_EXTRACT   GMX_SIMD_HAVE_FINT32_EXTRACT

1 if support is available for extracting elements from SimdInt32, otherwise 0

GMX_SIMD_HAVE_DINT32_EXTRACT if GMX_DOUBLE is 1, otherwise GMX_SIMD_HAVE_FINT32_EXTRACT.

#define GMX_SIMD_HAVE_INT32_LOGICAL   GMX_SIMD_HAVE_FINT32_LOGICAL

1 if logical ops are supported on SimdInt32, otherwise 0.

GMX_SIMD_HAVE_DINT32_LOGICAL if GMX_DOUBLE is 1, otherwise GMX_SIMD_HAVE_FINT32_LOGICAL.

#define GMX_SIMD_HAVE_REAL   GMX_SIMD_HAVE_FLOAT

1 if SimdReal is available, otherwise 0.

GMX_SIMD_HAVE_DOUBLE if GMX_DOUBLE is 1, otherwise GMX_SIMD_HAVE_FLOAT.

#define GMX_SIMD_REAL_WIDTH   GMX_SIMD_FLOAT_WIDTH

Width of SimdReal.

GMX_SIMD_DOUBLE_WIDTH if GMX_DOUBLE is 1, otherwise GMX_SIMD_FLOAT_WIDTH.

Function Documentation

static Simd4Float gmx_simdcall gmx::abs ( Simd4Float  a)
inlinestatic

SIMD4 Floating-point fabs().

Parameters
aany floating point values
Returns
fabs(a) for each element.
static Simd4Double gmx_simdcall gmx::abs ( Simd4Double  a)
inlinestatic

SIMD4 Floating-point abs().

Parameters
aany floating point values
Returns
fabs(a) for each element.
static SimdFloat gmx_simdcall gmx::abs ( SimdFloat  a)
inlinestatic

SIMD float Floating-point abs().

Parameters
aany floating point values
Returns
abs(a) for each element.
static SimdDouble gmx_simdcall gmx::abs ( SimdDouble  a)
inlinestatic

SIMD double floating-point fabs().

Parameters
aany floating point values
Returns
fabs(a) for each element.
static Simd4Double gmx_simdcall gmx::andNot ( Simd4Double  a,
Simd4Double  b 
)
inlinestatic

Bitwise andnot for two SIMD4 double variables. c=(~a) & b.

Available if GMX_SIMD_HAVE_LOGICAL is 1.

Parameters
adata1
bdata2
Returns
(~data1) & data2
static Simd4Float gmx_simdcall gmx::andNot ( Simd4Float  a,
Simd4Float  b 
)
inlinestatic

Bitwise andnot for two SIMD4 float variables. c=(~a) & b.

Available if GMX_SIMD_HAVE_LOGICAL is 1.

Parameters
adata1
bdata2
Returns
(~data1) & data2
static SimdFloat gmx_simdcall gmx::andNot ( SimdFloat  a,
SimdFloat  b 
)
inlinestatic

Bitwise andnot for SIMD float.

Available if GMX_SIMD_HAVE_LOGICAL is 1.

Parameters
adata1
bdata2
Returns
(~data1) & data2
static SimdDouble gmx_simdcall gmx::andNot ( SimdDouble  a,
SimdDouble  b 
)
inlinestatic

Bitwise andnot for SIMD double.

Available if GMX_SIMD_HAVE_LOGICAL is 1.

Parameters
adata1
bdata2
Returns
(~data1) & data2
static SimdFInt32 gmx_simdcall gmx::andNot ( SimdFInt32  a,
SimdFInt32  b 
)
inlinestatic

Integer SIMD bitwise not/complement.

Available if GMX_SIMD_HAVE_FINT32_LOGICAL is 1.

Note
You can not use this operation directly to select based on a boolean SIMD variable, since booleans are separate from integer SIMD. If that is what you need, have a look at gmx::selectByMask instead.
Parameters
ainteger SIMD
binteger SIMD
Returns
(~a) & b
static SimdDInt32 gmx_simdcall gmx::andNot ( SimdDInt32  a,
SimdDInt32  b 
)
inlinestatic

Integer SIMD bitwise not/complement.

Available if GMX_SIMD_HAVE_DINT32_LOGICAL is 1.

Note
You can not use this operation directly to select based on a boolean SIMD variable, since booleans are separate from integer SIMD. If that is what you need, have a look at gmx::selectByMask instead.
Parameters
ainteger SIMD
binteger SIMD
Returns
(~a) & b
static bool gmx_simdcall gmx::anyTrue ( Simd4FBool  a)
inlinestatic

Returns non-zero if any of the boolean in SIMD4 a is True, otherwise 0.

Parameters
aLogical variable.
Returns
true if any element in a is true, otherwise false.

The actual return value for truth will depend on the architecture, so any non-zero value is considered truth.

static bool gmx_simdcall gmx::anyTrue ( Simd4DBool  a)
inlinestatic

Returns non-zero if any of the boolean in SIMD4 a is True, otherwise 0.

Parameters
aLogical variable.
Returns
true if any element in a is true, otherwise false.

The actual return value for truth will depend on the architecture, so any non-zero value is considered truth.

static bool gmx_simdcall gmx::anyTrue ( SimdFBool  a)
inlinestatic

Returns non-zero if any of the boolean in SIMD a is True, otherwise 0.

Parameters
aLogical variable.
Returns
true if any element in a is true, otherwise false.

The actual return value for truth will depend on the architecture, so any non-zero value is considered truth.

static bool gmx_simdcall gmx::anyTrue ( SimdDBool  a)
inlinestatic

Returns non-zero if any of the boolean in SIMD a is True, otherwise 0.

Parameters
aLogical variable.
Returns
true if any element in a is true, otherwise false.

The actual return value for truth will depend on the architecture, so any non-zero value is considered truth.

static bool gmx_simdcall gmx::anyTrue ( SimdFIBool  a)
inlinestatic

Returns true if any of the boolean in x is True, otherwise 0.

Available if GMX_SIMD_HAVE_FINT32_ARITHMETICS is 1.

The actual return value for "any true" will depend on the architecture. Any non-zero value should be considered truth.

Parameters
aSIMD boolean
Returns
True if any of the elements in a is true, otherwise 0.
static bool gmx_simdcall gmx::anyTrue ( SimdDIBool  a)
inlinestatic

Returns true if any of the boolean in x is True, otherwise 0.

Available if GMX_SIMD_HAVE_DINT32_ARITHMETICS is 1.

The actual return value for "any true" will depend on the architecture. Any non-zero value should be considered truth.

Parameters
aSIMD boolean
Returns
True if any of the elements in a is true, otherwise 0.
static Simd4Float gmx_simdcall gmx::blend ( Simd4Float  a,
Simd4Float  b,
Simd4FBool  sel 
)
inlinestatic

Vector-blend SIMD4 selection.

Parameters
aFirst source
bSecond source
selBoolean selector
Returns
For each element, select b if sel is true, a otherwise.
static Simd4Double gmx_simdcall gmx::blend ( Simd4Double  a,
Simd4Double  b,
Simd4DBool  sel 
)
inlinestatic

Vector-blend SIMD4 selection.

Parameters
aFirst source
bSecond source
selBoolean selector
Returns
For each element, select b if sel is true, a otherwise.
static SimdFloat gmx_simdcall gmx::blend ( SimdFloat  a,
SimdFloat  b,
SimdFBool  sel 
)
inlinestatic

Vector-blend SIMD float selection.

Parameters
aFirst source
bSecond source
selBoolean selector
Returns
For each element, select b if sel is true, a otherwise.
static SimdDouble gmx_simdcall gmx::blend ( SimdDouble  a,
SimdDouble  b,
SimdDBool  sel 
)
inlinestatic

Vector-blend SIMD double selection.

Parameters
aFirst source
bSecond source
selBoolean selector
Returns
For each element, select b if sel is true, a otherwise.
static SimdFInt32 gmx_simdcall gmx::blend ( SimdFInt32  a,
SimdFInt32  b,
SimdFIBool  sel 
)
inlinestatic

Vector-blend SIMD integer selection.

Available if GMX_SIMD_HAVE_FINT32_ARITHMETICS is 1.

Parameters
aFirst source
bSecond source
selBoolean selector
Returns
For each element, select b if sel is true, a otherwise.
static SimdDInt32 gmx_simdcall gmx::blend ( SimdDInt32  a,
SimdDInt32  b,
SimdDIBool  sel 
)
inlinestatic

Vector-blend SIMD integer selection.

Available if GMX_SIMD_HAVE_DINT32_ARITHMETICS is 1.

Parameters
aFirst source
bSecond source
selBoolean selector
Returns
For each element, select b if sel is true, a otherwise.
static SimdFIBool gmx_simdcall gmx::cvtB2IB ( SimdFBool  a)
inlinestatic

Convert from single precision boolean to corresponding integer boolean.

Parameters
aSIMD floating-point boolean
Returns
SIMD integer boolean
static SimdDIBool gmx_simdcall gmx::cvtB2IB ( SimdDBool  a)
inlinestatic

Convert from double precision boolean to corresponding integer boolean.

Parameters
aSIMD floating-point boolean
Returns
SIMD integer boolean
static SimdFloat gmx_simdcall gmx::cvtD2F ( SimdDouble gmx_unused  d)
inlinestatic

Convert SIMD double to float.

This version is available if GMX_SIMD_FLOAT_WIDTH is identical to GMX_SIMD_DOUBLE_WIDTH.

Float/double conversions are complex since the SIMD width could either be different (e.g. on x86) or identical (e.g. IBM QPX). This means you will need to check for the width in the code, and have different code paths.

Parameters
dDouble-precision SIMD variable
Returns
Single-precision SIMD variable of the same width
static SimdFloat gmx_simdcall gmx::cvtDD2F ( SimdDouble gmx_unused  d0,
SimdDouble gmx_unused  d1 
)
inlinestatic

Convert SIMD double to float.

This version is available if GMX_SIMD_FLOAT_WIDTH is twice as large as GMX_SIMD_DOUBLE_WIDTH.

Float/double conversions are complex since the SIMD width could either be different (e.g. on x86) or identical (e.g. IBM QPX). This means you will need to check for the width in the code, and have different code paths.

Parameters
d0Double-precision SIMD variable, first half of values to put in f.
d1Double-precision SIMD variable, second half of values to put in f.
Returns
Single-precision SIMD variable with all values.
static SimdDouble gmx_simdcall gmx::cvtF2D ( SimdFloat gmx_unused  f)
inlinestatic

Convert SIMD float to double.

This version is available if GMX_SIMD_FLOAT_WIDTH is identical to GMX_SIMD_DOUBLE_WIDTH.

Float/double conversions are complex since the SIMD width could either be different (e.g. on x86) or identical (e.g. IBM QPX). This means you will need to check for the width in the code, and have different code paths.

Parameters
fSingle-precision SIMD variable
Returns
Double-precision SIMD variable of the same width
static void gmx_simdcall gmx::cvtF2DD ( SimdFloat gmx_unused  f,
SimdDouble gmx_unused d0,
SimdDouble gmx_unused d1 
)
inlinestatic

Convert SIMD float to double.

This version is available if GMX_SIMD_FLOAT_WIDTH is twice as large as GMX_SIMD_DOUBLE_WIDTH.

Float/double conversions are complex since the SIMD width could either be different (e.g. on x86) or identical (e.g. IBM QPX). This means you will need to check for the width in the code, and have different code paths.

Parameters
fSingle-precision SIMD variable
[out]d0Double-precision SIMD variable, first half of values from f.
[out]d1Double-precision SIMD variable, second half of values from f.
static SimdFloat gmx_simdcall gmx::cvtI2R ( SimdFInt32  a)
inlinestatic

Convert integer to single precision floating point.

Parameters
aSIMD integer
Returns
SIMD floating-point
static SimdDouble gmx_simdcall gmx::cvtI2R ( SimdDInt32  a)
inlinestatic

Convert integer to double precision floating point.

Parameters
aSIMD integer
Returns
SIMD floating-point
static SimdFBool gmx_simdcall gmx::cvtIB2B ( SimdFIBool  a)
inlinestatic

Convert from integer boolean to corresponding single precision boolean.

Parameters
aSIMD integer boolean
Returns
SIMD floating-point boolean
static SimdDBool gmx_simdcall gmx::cvtIB2B ( SimdDIBool  a)
inlinestatic

Convert from integer boolean to corresponding double precision boolean.

Parameters
aSIMD integer boolean
Returns
SIMD floating-point boolean
static SimdFInt32 gmx_simdcall gmx::cvtR2I ( SimdFloat  a)
inlinestatic

Round single precision floating point to integer.

Parameters
aSIMD floating-point
Returns
SIMD integer, rounded to nearest integer.
Note
Round mode is implementation defined. The only guarantee is that it is consistent between rounding functions (round, cvtR2I).
static SimdDInt32 gmx_simdcall gmx::cvtR2I ( SimdDouble  a)
inlinestatic

Round double precision floating point to integer.

Parameters
aSIMD floating-point
Returns
SIMD integer, rounded to nearest integer.
Note
Round mode is implementation defined. The only guarantee is that it is consistent between rounding functions (round, cvtR2I).
static SimdFInt32 gmx_simdcall gmx::cvttR2I ( SimdFloat  a)
inlinestatic

Truncate single precision floating point to integer.

Parameters
aSIMD floating-point
Returns
SIMD integer, truncated to nearest integer.
static SimdDInt32 gmx_simdcall gmx::cvttR2I ( SimdDouble  a)
inlinestatic

Truncate double precision floating point to integer.

Parameters
aSIMD floating-point
Returns
SIMD integer, truncated to nearest integer.
static void gmx_simdcall gmx::decr3Hsimd ( double *  m,
SimdDouble  a0,
SimdDouble  a1,
SimdDouble  a2 
)
inlinestatic

Add the two halves of three SIMD doubles, subtract the sum from three half-SIMD-width consecutive doubles in memory.

Parameters
mhalf-width aligned memory, from which sum of the halves will be subtracted.
a0SIMD variable. Upper & lower halves will first be added.
a1SIMD variable. Upper & lower halves will second be added.
a2SIMD variable. Upper & lower halves will third be added.

If the SIMD width is 8 and the vectors contain [a0 b0 c0 d0 e0 f0 g0 h0], [a1 b1 c1 d1 e1 f1 g1 g1] and [a2 b2 c2 d2 e2 f2 g2 h2], the memory will be modified to [m[0]-(a0+e0) m[1]-(b0+f0) m[2]-(c0+g0) m[3]-(d0+h0) m[4]-(a1+e1) m[5]-(b1+f1) m[6]-(c1+g1) m[7]-(d1+h1) m[8]-(a2+e2) m[9]-(b2+f2) m[10]-(c2+g2) m[11]-(d2+h2)].

The memory must be aligned to half SIMD width.

Available if GMX_SIMD_HAVE_HSIMD_UTIL_DOUBLE is 1.

static void gmx_simdcall gmx::decr3Hsimd ( float *  m,
SimdFloat  a0,
SimdFloat  a1,
SimdFloat  a2 
)
inlinestatic

Add the two halves of three SIMD floats, subtract the sum from three half-SIMD-width consecutive floats in memory.

Parameters
mhalf-width aligned memory, from which sum of the halves will be subtracted.
a0SIMD variable. Upper & lower halves will first be added.
a1SIMD variable. Upper & lower halves will second be added.
a2SIMD variable. Upper & lower halves will third be added.

If the SIMD width is 8 and the vectors contain [a0 b0 c0 d0 e0 f0 g0 h0], [a1 b1 c1 d1 e1 f1 g1 g1] and [a2 b2 c2 d2 e2 f2 g2 h2], the memory will be modified to [m[0]-(a0+e0) m[1]-(b0+f0) m[2]-(c0+g0) m[3]-(d0+h0) m[4]-(a1+e1) m[5]-(b1+f1) m[6]-(c1+g1) m[7]-(d1+h1) m[8]-(a2+e2) m[9]-(b2+f2) m[10]-(c2+g2) m[11]-(d2+h2)].

The memory must be aligned to half SIMD width.

Available if GMX_SIMD_HAVE_HSIMD_UTIL_FLOAT is 1.

static float gmx_simdcall gmx::dotProduct ( Simd4Float  a,
Simd4Float  b 
)
inlinestatic

Return dot product of two single precision SIMD4 variables.

The dot product is calculated between the first three elements in the two vectors, while the fourth is ignored. The result is returned as a scalar.

Parameters
avector1
bvector2
Returns
a[0]*b[0]+a[1]*b[1]+a[2]*b[2], returned as scalar. Last element is ignored.
static double gmx_simdcall gmx::dotProduct ( Simd4Double  a,
Simd4Double  b 
)
inlinestatic

Return dot product of two double precision SIMD4 variables.

The dot product is calculated between the first three elements in the two vectors, while the fourth is ignored. The result is returned as a scalar.

Parameters
avector1
bvector2
Returns
a[0]*b[0]+a[1]*b[1]+a[2]*b[2], returned as scalar. Last element is ignored.
static void gmx_simdcall gmx::expandScalarsToTriplets ( SimdDouble  scalar,
SimdDouble *  triplets0,
SimdDouble *  triplets1,
SimdDouble *  triplets2 
)
inlinestatic

Expand each element of double SIMD variable into three identical consecutive elements in three SIMD outputs.

Parameters
scalarFloating-point input, e.g. [s0 s1 s2 s3] if width=4.
[out]triplets0First output, e.g. [s0 s0 s0 s1] if width=4.
[out]triplets1Second output, e.g. [s1 s1 s2 s2] if width=4.
[out]triplets2Third output, e.g. [s2 s3 s3 s3] if width=4.

This routine is meant to use for things like scalar-vector multiplication, where the vectors are stored in a merged format like [x0 y0 z0 x1 y1 z1 ...], while the scalars are stored as [s0 s1 s2...], and the data cannot easily be changed to SIMD-friendly layout.

In this case, load 3 full-width SIMD variables from the vector array (This will always correspond to GMX_SIMD_DOUBLE_WIDTH triplets), load a single full-width variable from the scalar array, and call this routine to expand the data. You can then simply multiply the first, second and third pair of SIMD variables, and store the three results back into a suitable vector-format array.

static void gmx_simdcall gmx::expandScalarsToTriplets ( SimdFloat  scalar,
SimdFloat *  triplets0,
SimdFloat *  triplets1,
SimdFloat *  triplets2 
)
inlinestatic

Expand each element of float SIMD variable into three identical consecutive elements in three SIMD outputs.

Parameters
scalarFloating-point input, e.g. [s0 s1 s2 s3] if width=4.
[out]triplets0First output, e.g. [s0 s0 s0 s1] if width=4.
[out]triplets1Second output, e.g. [s1 s1 s2 s2] if width=4.
[out]triplets2Third output, e.g. [s2 s3 s3 s3] if width=4.

This routine is meant to use for things like scalar-vector multiplication, where the vectors are stored in a merged format like [x0 y0 z0 x1 y1 z1 ...], while the scalars are stored as [s0 s1 s2...], and the data cannot easily be changed to SIMD-friendly layout.

In this case, load 3 full-width SIMD variables from the vector array (This will always correspond to GMX_SIMD_FLOAT_WIDTH triplets), load a single full-width variable from the scalar array, and call this routine to expand the data. You can then simply multiply the first, second and third pair of SIMD variables, and store the three results back into a suitable vector-format array.

template<int index>
static std::int32_t gmx_simdcall gmx::extract ( SimdFInt32  a)
inlinestatic

Extract element with index i from gmx::SimdFInt32.

Available if GMX_SIMD_HAVE_FINT32_EXTRACT is 1.

Template Parameters
indexCompile-time constant, position to extract (first position is 0)
Parameters
aSIMD variable from which to extract value.
Returns
Single integer from position index in SIMD variable.
template<int index>
static std::int32_t gmx_simdcall gmx::extract ( SimdDInt32  a)
inlinestatic

Extract element with index i from gmx::SimdDInt32.

Available if GMX_SIMD_HAVE_DINT32_EXTRACT is 1.

Template Parameters
indexCompile-time constant, position to extract (first position is 0)
Parameters
aSIMD variable from which to extract value.
Returns
Single integer from position index in SIMD variable.
static Simd4Float gmx_simdcall gmx::fma ( Simd4Float  a,
Simd4Float  b,
Simd4Float  c 
)
inlinestatic

SIMD4 Fused-multiply-add. Result is a*b+c.

Parameters
afactor1
bfactor2
cterm
Returns
a*b+c
static Simd4Double gmx_simdcall gmx::fma ( Simd4Double  a,
Simd4Double  b,
Simd4Double  c 
)
inlinestatic

SIMD4 Fused-multiply-add. Result is a*b+c.

Parameters
afactor1
bfactor2
cterm
Returns
a*b+c
static SimdFloat gmx_simdcall gmx::fma ( SimdFloat  a,
SimdFloat  b,
SimdFloat  c 
)
inlinestatic

SIMD float Fused-multiply-add. Result is a*b+c.

Parameters
afactor1
bfactor2
cterm
Returns
a*b+c
static SimdDouble gmx_simdcall gmx::fma ( SimdDouble  a,
SimdDouble  b,
SimdDouble  c 
)
inlinestatic

SIMD double Fused-multiply-add. Result is a*b+c.

Parameters
afactor1
bfactor2
cterm
Returns
a*b+c
static Simd4Float gmx_simdcall gmx::fms ( Simd4Float  a,
Simd4Float  b,
Simd4Float  c 
)
inlinestatic

SIMD4 Fused-multiply-subtract. Result is a*b-c.

Parameters
afactor1
bfactor2
cterm
Returns
a*b-c
static Simd4Double gmx_simdcall gmx::fms ( Simd4Double  a,
Simd4Double  b,
Simd4Double  c 
)
inlinestatic

SIMD4 Fused-multiply-subtract. Result is a*b-c.

Parameters
afactor1
bfactor2
cterm
Returns
a*b-c
static SimdFloat gmx_simdcall gmx::fms ( SimdFloat  a,
SimdFloat  b,
SimdFloat  c 
)
inlinestatic

SIMD float Fused-multiply-subtract. Result is a*b-c.

Parameters
afactor1
bfactor2
cterm
Returns
a*b-c
static SimdDouble gmx_simdcall gmx::fms ( SimdDouble  a,
SimdDouble  b,
SimdDouble  c 
)
inlinestatic

SIMD double Fused-multiply-subtract. Result is a*b-c.

Parameters
afactor1
bfactor2
cterm
Returns
a*b-c
static Simd4Double gmx_simdcall gmx::fnma ( Simd4Double  a,
Simd4Double  b,
Simd4Double  c 
)
inlinestatic

SIMD4 Fused-negated-multiply-add. Result is -a*b+c.

Parameters
afactor1
bfactor2
cterm
Returns
-a*b+c
static Simd4Float gmx_simdcall gmx::fnma ( Simd4Float  a,
Simd4Float  b,
Simd4Float  c 
)
inlinestatic

SIMD4 Fused-negated-multiply-add. Result is -a*b+c.

Parameters
afactor1
bfactor2
cterm
Returns
-a*b+c
static SimdFloat gmx_simdcall gmx::fnma ( SimdFloat  a,
SimdFloat  b,
SimdFloat  c 
)
inlinestatic

SIMD float Fused-negated-multiply-add. Result is -a*b+c.

Parameters
afactor1
bfactor2
cterm
Returns
-a*b+c
static SimdDouble gmx_simdcall gmx::fnma ( SimdDouble  a,
SimdDouble  b,
SimdDouble  c 
)
inlinestatic

SIMD double Fused-negated-multiply-add. Result is -a*b+c.

Parameters
afactor1
bfactor2
cterm
Returns
-a*b+c
static Simd4Double gmx_simdcall gmx::fnms ( Simd4Double  a,
Simd4Double  b,
Simd4Double  c 
)
inlinestatic

SIMD4 Fused-negated-multiply-subtract. Result is -a*b-c.

Parameters
afactor1
bfactor2
cterm
Returns
-a*b-c
static Simd4Float gmx_simdcall gmx::fnms ( Simd4Float  a,
Simd4Float  b,
Simd4Float  c 
)
inlinestatic

SIMD4 Fused-negated-multiply-subtract. Result is -a*b-c.

Parameters
afactor1
bfactor2
cterm
Returns
-a*b-c
static SimdFloat gmx_simdcall gmx::fnms ( SimdFloat  a,
SimdFloat  b,
SimdFloat  c 
)
inlinestatic

SIMD float Fused-negated-multiply-subtract. Result is -a*b-c.

Parameters
afactor1
bfactor2
cterm
Returns
-a*b-c
static SimdDouble gmx_simdcall gmx::fnms ( SimdDouble  a,
SimdDouble  b,
SimdDouble  c 
)
inlinestatic

SIMD double Fused-negated-multiply-subtract. Result is -a*b-c.

Parameters
afactor1
bfactor2
cterm
Returns
-a*b-c
template<MathOptimization opt = MathOptimization::Safe>
static SimdFloat gmx_simdcall gmx::frexp ( SimdFloat  value,
SimdFInt32 *  exponent 
)
inlinestatic

Extract (integer) exponent and fraction from single precision SIMD.

Template Parameters
optBy default this function behaves like the standard library such that frexp(+-0,exp) returns +-0 and stores 0 in the exponent when value is 0. If you know the argument is always nonzero, you can set the template parameter to MathOptimization::Unsafe to make it slightly faster.
Parameters
valueFloating-point value to extract from
[out]exponentReturned exponent of value, integer SIMD format.
Returns
Fraction of value, floating-point SIMD format.
template<MathOptimization opt = MathOptimization::Safe>
static SimdDouble gmx_simdcall gmx::frexp ( SimdDouble  value,
SimdDInt32 *  exponent 
)
inlinestatic

Extract (integer) exponent and fraction from double precision SIMD.

Template Parameters
optBy default this function behaves like the standard library such that frexp(+-0,exp) returns +-0 and stores 0 in the exponent when value is 0. If you know the argument is always nonzero, you can set the template parameter to MathOptimization::Unsafe to make it slightly faster.
Parameters
valueFloating-point value to extract from
[out]exponentReturned exponent of value, integer SIMD format.
Returns
Fraction of value, floating-point SIMD format.
template<int align>
static void gmx_simdcall gmx::gatherLoadBySimdIntTranspose ( const double *  base,
SimdDInt32  offset,
SimdDouble *  v0,
SimdDouble *  v1,
SimdDouble *  v2,
SimdDouble *  v3 
)
inlinestatic

Load 4 consecutive doubles from each of GMX_SIMD_DOUBLE_WIDTH offsets specified by a SIMD integer, transpose into 4 SIMD double variables.

Template Parameters
alignAlignment of the memory from which we read, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 4 for this routine) the input data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are loaded.
Parameters
baseAligned pointer to the start of the memory.
offsetSIMD integer type with offsets to the start of each triplet.
[out]v0First component, base[align*offset[i]] for each i.
[out]v1Second component, base[align*offset[i] + 1] for each i.
[out]v2Third component, base[align*offset[i] + 2] for each i.
[out]v3Fourth component, base[align*offset[i] + 3] for each i.

The floating-point memory locations must be aligned, but only to the smaller of four elements and the floating-point SIMD width.

Note
You should NOT scale offsets before calling this routine; it is done internally by using the alignment template parameter instead.
This is a special routine primarily intended for loading Gromacs table data as efficiently as possible - this is the reason for using a SIMD offset index, since the result of the real-to-integer conversion is present in a SIMD register just before calling this routine.
template<int align>
static void gmx_simdcall gmx::gatherLoadBySimdIntTranspose ( const float *  base,
SimdFInt32  offset,
SimdFloat *  v0,
SimdFloat *  v1,
SimdFloat *  v2,
SimdFloat *  v3 
)
inlinestatic

Load 4 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets specified by a SIMD integer, transpose into 4 SIMD float variables.

Template Parameters
alignAlignment of the memory from which we read, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 4 for this routine) the input data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are loaded.
Parameters
baseAligned pointer to the start of the memory.
offsetSIMD integer type with offsets to the start of each triplet.
[out]v0First component, base[align*offset[i]] for each i.
[out]v1Second component, base[align*offset[i] + 1] for each i.
[out]v2Third component, base[align*offset[i] + 2] for each i.
[out]v3Fourth component, base[align*offset[i] + 3] for each i.

The floating-point memory locations must be aligned, but only to the smaller of four elements and the floating-point SIMD width.

Note
You should NOT scale offsets before calling this routine; it is done internally by using the alignment template parameter instead.
This is a special routine primarily intended for loading Gromacs table data as efficiently as possible - this is the reason for using a SIMD offset index, since the result of the real-to-integer conversion is present in a SIMD register just before calling this routine.
template<int align>
static void gmx_simdcall gmx::gatherLoadBySimdIntTranspose ( const double *  base,
SimdDInt32  offset,
SimdDouble *  v0,
SimdDouble *  v1 
)
inlinestatic

Load 2 consecutive doubles from each of GMX_SIMD_DOUBLE_WIDTH offsets specified by a SIMD integer, transpose into 2 SIMD double variables.

Template Parameters
alignAlignment of the memory from which we read, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 2 for this routine) the input data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are loaded.
Parameters
baseAligned pointer to the start of the memory.
offsetSIMD integer type with offsets to the start of each triplet.
[out]v0First component, base[align*offset[i]] for each i.
[out]v1Second component, base[align*offset[i] + 1] for each i.

The floating-point memory locations must be aligned, but only to the smaller of two elements and the floating-point SIMD width.

Note
You should NOT scale offsets before calling this routine; it is done internally by using the alignment template parameter instead.
This is a special routine primarily intended for loading Gromacs table data as efficiently as possible - this is the reason for using a SIMD offset index, since the result of the real-to-integer conversion is present in a SIMD register just before calling this routine.
template<int align>
static void gmx_simdcall gmx::gatherLoadBySimdIntTranspose ( const float *  base,
SimdFInt32  offset,
SimdFloat *  v0,
SimdFloat *  v1 
)
inlinestatic

Load 2 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets specified by a SIMD integer, transpose into 2 SIMD float variables.

Template Parameters
alignAlignment of the memory from which we read, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 2 for this routine) the input data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are loaded.
Parameters
baseAligned pointer to the start of the memory.
offsetSIMD integer type with offsets to the start of each triplet.
[out]v0First component, base[align*offset[i]] for each i.
[out]v1Second component, base[align*offset[i] + 1] for each i.

The floating-point memory locations must be aligned, but only to the smaller of two elements and the floating-point SIMD width.

Note
You should NOT scale offsets before calling this routine; it is done internally by using the alignment template parameter instead.
This is a special routine primarily intended for loading Gromacs table data as efficiently as possible - this is the reason for using a SIMD offset index, since the result of the real-to-integer conversion is present in a SIMD register just before calling this routine.
template<int align>
static void gmx_simdcall gmx::gatherLoadTranspose ( const double *  base,
const std::int32_t  offset[],
SimdDouble *  v0,
SimdDouble *  v1,
SimdDouble *  v2,
SimdDouble *  v3 
)
inlinestatic

Load 4 consecutive double from each of GMX_SIMD_DOUBLE_WIDTH offsets, and transpose into 4 SIMD double variables.

Template Parameters
alignAlignment of the memory from which we read, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 4 for this routine) the input data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are loaded.
Parameters
basePointer to the start of the memory area
offsetArray with offsets to the start of each data point.
[out]v01st component of data, base[align*offset[i]] for each i.
[out]v12nd component of data, base[align*offset[i] + 1] for each i.
[out]v23rd component of data, base[align*offset[i] + 2] for each i.
[out]v34th component of data, base[align*offset[i] + 3] for each i.

The floating-point memory locations must be aligned, but only to the smaller of four elements and the floating-point SIMD width.

The offset memory must be aligned to GMX_SIMD_DINT32_WIDTH.

Note
You should NOT scale offsets before calling this routine; it is done internally by using the alignment template parameter instead.
template<int align>
static void gmx_simdcall gmx::gatherLoadTranspose ( const float *  base,
const std::int32_t  offset[],
SimdFloat *  v0,
SimdFloat *  v1,
SimdFloat *  v2,
SimdFloat *  v3 
)
inlinestatic

Load 4 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets, and transpose into 4 SIMD float variables.

Template Parameters
alignAlignment of the memory from which we read, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 4 for this routine) the input data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are loaded.
Parameters
basePointer to the start of the memory area
offsetArray with offsets to the start of each data point.
[out]v01st component of data, base[align*offset[i]] for each i.
[out]v12nd component of data, base[align*offset[i] + 1] for each i.
[out]v23rd component of data, base[align*offset[i] + 2] for each i.
[out]v34th component of data, base[align*offset[i] + 3] for each i.

The floating-point memory locations must be aligned, but only to the smaller of four elements and the floating-point SIMD width.

The offset memory must be aligned to GMX_SIMD_DINT32_WIDTH.

Note
You should NOT scale offsets before calling this routine; it is done internally by using the alignment template parameter instead.
template<int align>
static void gmx_simdcall gmx::gatherLoadTranspose ( const double *  base,
const std::int32_t  offset[],
SimdDouble *  v0,
SimdDouble *  v1 
)
inlinestatic

Load 2 consecutive double from each of GMX_SIMD_DOUBLE_WIDTH offsets, and transpose into 2 SIMD double variables.

Template Parameters
alignAlignment of the memory from which we read, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 2 for this routine) the input data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are loaded.
Parameters
basePointer to the start of the memory area
offsetArray with offsets to the start of each data point.
[out]v01st component of data, base[align*offset[i]] for each i.
[out]v12nd component of data, base[align*offset[i] + 1] for each i.

The floating-point memory locations must be aligned, but only to the smaller of two elements and the floating-point SIMD width.

The offset memory must be aligned to GMX_SIMD_DINT32_WIDTH.

Note
You should NOT scale offsets before calling this routine; it is done internally by using the alignment template parameter instead.
template<int align>
static void gmx_simdcall gmx::gatherLoadTranspose ( const float *  base,
const std::int32_t  offset[],
SimdFloat *  v0,
SimdFloat *  v1 
)
inlinestatic

Load 2 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets, and transpose into 2 SIMD float variables.

Template Parameters
alignAlignment of the memory from which we read, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 2 for this routine) the input data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are loaded.
Parameters
basePointer to the start of the memory area
offsetArray with offsets to the start of each data point.
[out]v01st component of data, base[align*offset[i]] for each i.
[out]v12nd component of data, base[align*offset[i] + 1] for each i.

The floating-point memory locations must be aligned, but only to the smaller of two elements and the floating-point SIMD width.

The offset memory must be aligned to GMX_SIMD_FINT32_WIDTH.

To achieve the best possible performance, you should store your data with alignment c_simdBestPairAlignmentFloat in single, or c_simdBestPairAlignmentDouble in double.

Note
You should NOT scale offsets before calling this routine; it is done internally by using the alignment template parameter instead.
template<int align>
static void gmx_simdcall gmx::gatherLoadTransposeHsimd ( const double *  base0,
const double *  base1,
std::int32_t  offset[],
SimdDouble *  v0,
SimdDouble *  v1 
)
inlinestatic

Load 2 consecutive doubles from each of GMX_SIMD_DOUBLE_WIDTH/2 offsets, transpose into SIMD double (low half from base0, high from base1).

Template Parameters
alignAlignment of the storage, i.e. the distance (measured in elements, not bytes) between index points. When this is identical to the number of output components the data is packed without padding. This must be a multiple of the alignment to keep all data aligned.
Parameters
base0Pointer to base of first aligned memory
base1Pointer to base of second aligned memory
offsetOffset to the start of each pair
[out]v01st element in each pair, base0 in low and base1 in high half.
[out]v12nd element in each pair, base0 in low and base1 in high half.

The offset array should be of half the SIMD width length, so it corresponds to the half-SIMD-register operations. This also means it must be aligned to half the integer SIMD width (i.e., GMX_SIMD_DINT32_WIDTH/2).

The floating-point memory locations must be aligned, but only to the smaller of two elements and the floating-point SIMD width.

This routine is primarily designed to load nonbonded parameters in the kernels. It is the equivalent of the full-width routine gatherLoadTranspose(), but just as the other hsimd routines it will pick half-SIMD-width data from base0 and put in the lower half, while the upper half comes from base1.

For an example, assume the SIMD width is 8, align is 2, that base0 is [A0 A1 B0 B1 C0 C1 D0 D1 ...], and base1 [E0 E1 F0 F1 G0 G1 H0 H1...].

Then we will get v0 as [A0 B0 C0 D0 E0 F0 G0 H0] and v1 as [A1 B1 C1 D1 E1 F1 G1 H1].

Available if GMX_SIMD_HAVE_HSIMD_UTIL_DOUBLE is 1.

template<int align>
static void gmx_simdcall gmx::gatherLoadTransposeHsimd ( const float *  base0,
const float *  base1,
const std::int32_t  offset[],
SimdFloat *  v0,
SimdFloat *  v1 
)
inlinestatic

Load 2 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH/2 offsets, transpose into SIMD float (low half from base0, high from base1).

Template Parameters
alignAlignment of the storage, i.e. the distance (measured in elements, not bytes) between index points. When this is identical to the number of output components the data is packed without padding. This must be a multiple of the alignment to keep all data aligned.
Parameters
base0Pointer to base of first aligned memory
base1Pointer to base of second aligned memory
offsetOffset to the start of each pair
[out]v01st element in each pair, base0 in low and base1 in high half.
[out]v12nd element in each pair, base0 in low and base1 in high half.

The offset array should be of half the SIMD width length, so it corresponds to the half-SIMD-register operations. This also means it must be aligned to half the integer SIMD width (i.e., GMX_SIMD_FINT32_WIDTH/2).

The floating-point memory locations must be aligned, but only to the smaller of two elements and the floating-point SIMD width.

This routine is primarily designed to load nonbonded parameters in the kernels. It is the equivalent of the full-width routine gatherLoadTranspose(), but just as the other hsimd routines it will pick half-SIMD-width data from base0 and put in the lower half, while the upper half comes from base1.

For an example, assume the SIMD width is 8, align is 2, that base0 is [A0 A1 B0 B1 C0 C1 D0 D1 ...], and base1 [E0 E1 F0 F1 G0 G1 H0 H1...].

Then we will get v0 as [A0 B0 C0 D0 E0 F0 G0 H0] and v1 as [A1 B1 C1 D1 E1 F1 G1 H1].

Available if GMX_SIMD_HAVE_HSIMD_UTIL_FLOAT is 1.

template<int align>
static void gmx_simdcall gmx::gatherLoadUBySimdIntTranspose ( const double *  base,
SimdDInt32  offset,
SimdDouble *  v0,
SimdDouble *  v1 
)
inlinestatic

Load 2 consecutive doubles from each of GMX_SIMD_DOUBLE_WIDTH offsets (unaligned) specified by SIMD integer, transpose into 2 SIMD doubles.

Template Parameters
alignAlignment of the memory from which we read, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 2 for this routine) the input data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are loaded.
Parameters
basePointer to the start of the memory.
offsetSIMD integer type with offsets to the start of each triplet.
[out]v0First component, base[align*offset[i]] for each i.
[out]v1Second component, base[align*offset[i] + 1] for each i.

Since some SIMD architectures cannot handle any unaligned loads, this routine is only available if GMX_SIMD_HAVE_GATHER_LOADU_BYSIMDINT_TRANSPOSE is 1.

Note
You should NOT scale offsets before calling this routine; it is done internally by using the alignment template parameter instead.
This is a special routine primarily intended for loading Gromacs table data as efficiently as possible - this is the reason for using a SIMD offset index, since the result of the real-to-integer conversion is present in a SIMD register just before calling this routine.
template<int align>
static void gmx_simdcall gmx::gatherLoadUBySimdIntTranspose ( const float *  base,
SimdFInt32  offset,
SimdFloat *  v0,
SimdFloat *  v1 
)
inlinestatic

Load 2 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets (unaligned) specified by SIMD integer, transpose into 2 SIMD floats.

Template Parameters
alignAlignment of the memory from which we read, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 2 for this routine) the input data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are loaded.
Parameters
basePointer to the start of the memory.
offsetSIMD integer type with offsets to the start of each triplet.
[out]v0First component, base[align*offset[i]] for each i.
[out]v1Second component, base[align*offset[i] + 1] for each i.

Since some SIMD architectures cannot handle any unaligned loads, this routine is only available if GMX_SIMD_HAVE_GATHER_LOADU_BYSIMDINT_TRANSPOSE is 1.

Note
You should NOT scale offsets before calling this routine; it is done internally by using the alignment template parameter instead.
This is a special routine primarily intended for loading Gromacs table data as efficiently as possible - this is the reason for using a SIMD offset index, since the result of the real-to-integer conversion is present in a SIMD register just before calling this routine.
template<int align>
static void gmx_simdcall gmx::gatherLoadUTranspose ( const double *  base,
const std::int32_t  offset[],
SimdDouble *  v0,
SimdDouble *  v1,
SimdDouble *  v2 
)
inlinestatic

Load 3 consecutive doubles from each of GMX_SIMD_DOUBLE_WIDTH offsets, and transpose into 3 SIMD double variables.

Template Parameters
alignAlignment of the memory from which we read, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 3 for this routine) the input data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are loaded.
Parameters
basePointer to the start of the memory area
offsetArray with offsets to the start of each data point.
[out]v01st component of data, base[align*offset[i]] for each i.
[out]v12nd component of data, base[align*offset[i] + 1] for each i.
[out]v23rd component of data, base[align*offset[i] + 2] for each i.

This function can work with both aligned (better performance) and unaligned memory. When the align parameter is not a power-of-two (align==3 would be normal for packed atomic coordinates) the memory obviously cannot be aligned, and we account for this. However, in the case where align is a power-of-two, we assume the base pointer also has the same alignment, which will enable many platforms to use faster aligned memory load operations. An easy way to think of this is that each triplet of data in memory must be aligned to the align parameter you specify when it's a power-of-two.

The offset memory must always be aligned to GMX_SIMD_FINT32_WIDTH, since this enables us to use SIMD loads and gather operations on platforms that support it.

Note
You should NOT scale offsets before calling this routine; it is done internally by using the alignment template parameter instead.
This routine uses a normal array for the offsets, since we typically load this data from memory. On the architectures we have tested this is faster even when a SIMD integer datatype is present.
To improve performance, this function might use full-SIMD-width unaligned loads. This means you need to ensure the memory is padded at the end, so we always can load GMX_SIMD_REAL_WIDTH elements starting at the last offset. If you use the Gromacs aligned memory allocation routines this will always be the case.
template<int align>
static void gmx_simdcall gmx::gatherLoadUTranspose ( const float *  base,
const std::int32_t  offset[],
SimdFloat *  v0,
SimdFloat *  v1,
SimdFloat *  v2 
)
inlinestatic

Load 3 consecutive floats from each of GMX_SIMD_FLOAT_WIDTH offsets, and transpose into 3 SIMD float variables.

Template Parameters
alignAlignment of the memory from which we read, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 3 for this routine) the input data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are loaded.
Parameters
basePointer to the start of the memory area
offsetArray with offsets to the start of each data point.
[out]v01st component of data, base[align*offset[i]] for each i.
[out]v12nd component of data, base[align*offset[i] + 1] for each i.
[out]v23rd component of data, base[align*offset[i] + 2] for each i.

This function can work with both aligned (better performance) and unaligned memory. When the align parameter is not a power-of-two (align==3 would be normal for packed atomic coordinates) the memory obviously cannot be aligned, and we account for this. However, in the case where align is a power-of-two, we assume the base pointer also has the same alignment, which will enable many platforms to use faster aligned memory load operations. An easy way to think of this is that each triplet of data in memory must be aligned to the align parameter you specify when it's a power-of-two.

The offset memory must always be aligned to GMX_SIMD_FINT32_WIDTH, since this enables us to use SIMD loads and gather operations on platforms that support it.

Note
You should NOT scale offsets before calling this routine; it is done internally by using the alignment template parameter instead.
This routine uses a normal array for the offsets, since we typically load this data from memory. On the architectures we have tested this is faster even when a SIMD integer datatype is present.
To improve performance, this function might use full-SIMD-width unaligned loads. This means you need to ensure the memory is padded at the end, so we always can load GMX_SIMD_REAL_WIDTH elements starting at the last offset. If you use the Gromacs aligned memory allocation routines this will always be the case.
static void gmx_simdcall gmx::incrDualHsimd ( double *  m0,
double *  m1,
SimdDouble  a 
)
inlinestatic

Add each half of SIMD variable to separate memory adresses.

Parameters
m0Pointer to memory aligned to half SIMD width.
m1Pointer to memory aligned to half SIMD width.
aSIMD variable. Lower half will be added to m0, upper half to m1.

The memory must be aligned to half SIMD width.

Note
The updated m0 value is written before m1 is read from memory, so the result will be correct even if the memory regions overlap.

Available if GMX_SIMD_HAVE_HSIMD_UTIL_DOUBLE is 1.

static void gmx_simdcall gmx::incrDualHsimd ( float *  m0,
float *  m1,
SimdFloat  a 
)
inlinestatic

Add each half of SIMD variable to separate memory adresses.

Parameters
m0Pointer to memory aligned to half SIMD width.
m1Pointer to memory aligned to half SIMD width.
aSIMD variable. Lower half will be added to m0, upper half to m1.

The memory must be aligned to half SIMD width.

Note
The updated m0 value is written before m1 is read from memory, so the result will be correct even if the memory regions overlap.

Available if GMX_SIMD_HAVE_HSIMD_UTIL_FLOAT is 1.

template<MathOptimization opt = MathOptimization::Safe>
static SimdFloat gmx_simdcall gmx::ldexp ( SimdFloat  value,
SimdFInt32  exponent 
)
inlinestatic

Multiply a SIMD float value by the number 2 raised to an exp power.

Template Parameters
optBy default, this routine will return zero for input arguments that are so small they cannot be reproduced in the current precision. If the unsafe math optimization template parameter setting is used, these tests are skipped, and the result will be undefined (possible even NaN). This might happen below -127 in single precision or -1023 in double, although some might use denormal support to extend the range.
Parameters
valueFloating-point number to multiply with new exponent
exponentInteger that will not overflow as 2^exponent.
Returns
value*2^exponent
template<MathOptimization opt = MathOptimization::Safe>
static SimdDouble gmx_simdcall gmx::ldexp ( SimdDouble  value,
SimdDInt32  exponent 
)
inlinestatic

Multiply a SIMD double value by the number 2 raised to an exp power.

Template Parameters
optBy default, this routine will return zero for input arguments that are so small they cannot be reproduced in the current precision. If the unsafe math optimization template parameter setting is used, these tests are skipped, and the result will be undefined (possible even NaN). This might happen below -127 in single precision or -1023 in double, although some might use denormal support to extend the range.
Parameters
valueFloating-point number to multiply with new exponent
exponentInteger that will not overflow as 2^exponent.
Returns
value*2^exponent
static Simd4Float gmx_simdcall gmx::load4 ( const float *  m)
inlinestatic

Load 4 float values from aligned memory into SIMD4 variable.

Parameters
mPointer to memory aligned to 4 elements.
Returns
SIMD4 variable with data loaded.
static Simd4Double gmx_simdcall gmx::load4 ( const double *  m)
inlinestatic

Load 4 double values from aligned memory into SIMD4 variable.

Parameters
mPointer to memory aligned to 4 elements.
Returns
SIMD4 variable with data loaded.
static SimdDouble gmx_simdcall gmx::load4DuplicateN ( const double *  m)
inlinestatic

Load 4 doubles and duplicate them N times each.

Parameters
mPointer to memory aligned to 4 doubles
Returns
SIMD variable with 4 doubles from m duplicated Nx.

Available if GMX_SIMD_HAVE_4NSIMD_UTIL_DOUBLE is 1. N is GMX_SIMD_DOUBLE_WIDTH/4. Different values are contigous and same values are 4 positions in SIMD apart.

static SimdFloat gmx_simdcall gmx::load4DuplicateN ( const float *  m)
inlinestatic

Load 4 floats and duplicate them N times each.

Parameters
mPointer to memory aligned to 4 floats
Returns
SIMD variable with 4 floats from m duplicated Nx.

Available if GMX_SIMD_HAVE_4NSIMD_UTIL_FLOAT is 1. N is GMX_SIMD_FLOAT_WIDTH/4. Different values are contigous and same values are 4 positions in SIMD apart.

static Simd4Float gmx_simdcall gmx::load4U ( const float *  m)
inlinestatic

Load SIMD4 float from unaligned memory.

Available if GMX_SIMD_HAVE_LOADU is 1.

Parameters
mPointer to memory, no alignment requirement.
Returns
SIMD4 variable with data loaded.
static Simd4Double gmx_simdcall gmx::load4U ( const double *  m)
inlinestatic

Load SIMD4 double from unaligned memory.

Available if GMX_SIMD_HAVE_LOADU is 1.

Parameters
mPointer to memory, no alignment requirement.
Returns
SIMD4 variable with data loaded.
static SimdDouble gmx_simdcall gmx::loadDualHsimd ( const double *  m0,
const double *  m1 
)
inlinestatic

Load low & high parts of SIMD double from different locations.

Parameters
m0Pointer to memory aligned to half SIMD width.
m1Pointer to memory aligned to half SIMD width.
Returns
SIMD variable with low part loaded from m0, high from m1.

Available if GMX_SIMD_HAVE_HSIMD_UTIL_DOUBLE is 1.

static SimdFloat gmx_simdcall gmx::loadDualHsimd ( const float *  m0,
const float *  m1 
)
inlinestatic

Load low & high parts of SIMD float from different locations.

Parameters
m0Pointer to memory aligned to half SIMD width.
m1Pointer to memory aligned to half SIMD width.
Returns
SIMD variable with low part loaded from m0, high from m1.

Available if GMX_SIMD_HAVE_HSIMD_UTIL_FLOAT is 1.

static SimdDouble gmx_simdcall gmx::loadDuplicateHsimd ( const double *  m)
inlinestatic

Load half-SIMD-width double data, spread to both halves.

Parameters
mPointer to memory aligned to half SIMD width.
Returns
SIMD variable with both halves loaded from m..

Available if GMX_SIMD_HAVE_HSIMD_UTIL_DOUBLE is 1.

static SimdFloat gmx_simdcall gmx::loadDuplicateHsimd ( const float *  m)
inlinestatic

Load half-SIMD-width float data, spread to both halves.

Parameters
mPointer to memory aligned to half SIMD width.
Returns
SIMD variable with both halves loaded from m..

Available if GMX_SIMD_HAVE_HSIMD_UTIL_FLOAT is 1.

static SimdDouble gmx_simdcall gmx::loadU1DualHsimd ( const double *  m)
inlinestatic

Load two doubles, spread 1st in low half, 2nd in high half.

Parameters
mPointer to two adjacent double values.
Returns
SIMD variable where all elements in the low half have been set to m[0], and all elements in high half to m[1].
Note
This routine always loads two values and sets the halves separately. If you want to set all elements to the same value, simply use the standard (non-half-SIMD) operations.

Available if GMX_SIMD_HAVE_HSIMD_UTIL_DOUBLE is 1.

static SimdFloat gmx_simdcall gmx::loadU1DualHsimd ( const float *  m)
inlinestatic

Load two floats, spread 1st in low half, 2nd in high half.

Parameters
mPointer to two adjacent float values.
Returns
SIMD variable where all elements in the low half have been set to m[0], and all elements in high half to m[1].
Note
This routine always loads two values and sets the halves separately. If you want to set all elements to the same value, simply use the standard (non-half-SIMD) operations.

Available if GMX_SIMD_HAVE_HSIMD_UTIL_FLOAT is 1.

static SimdDouble gmx_simdcall gmx::loadU4NOffset ( const double *  m,
int  offset 
)
inlinestatic

Load doubles in blocks of 4 at fixed offsets.

Parameters
mPointer to unaligned memory
offsetOffset in memory between input blocks of 4
Returns
SIMD variable with doubles from m.

Available if GMX_SIMD_HAVE_4NSIMD_UTIL_DOUBLE is 1. Blocks of 4 doubles are loaded from m+n*offset where n is the n-th block of 4 doubles.

static SimdFloat gmx_simdcall gmx::loadU4NOffset ( const float *  m,
int  offset 
)
inlinestatic

Load floats in blocks of 4 at fixed offsets.

Parameters
mPointer to unaligned memory
offsetOffset in memory between input blocks of 4
Returns
SIMD variable with floats from m.

Available if GMX_SIMD_HAVE_4NSIMD_UTIL_FLOAT is 1. Blocks of 4 floats are loaded from m+n*offset where n is the n-th block of 4 floats.

static SimdDouble gmx_simdcall gmx::loadUNDuplicate4 ( const double *  m)
inlinestatic

Load N doubles and duplicate them 4 times each.

Parameters
mPointer to unaligned memory
Returns
SIMD variable with N doubles from m duplicated 4x.

Available if GMX_SIMD_HAVE_4NSIMD_UTIL_DOUBLE is 1. N is GMX_SIMD_DOUBLE_WIDTH/4. Duplicated values are contigous and different values are 4 positions in SIMD apart.

static SimdFloat gmx_simdcall gmx::loadUNDuplicate4 ( const float *  m)
inlinestatic

Load N floats and duplicate them 4 times each.

Parameters
mPointer to unaligned memory
Returns
SIMD variable with N floats from m duplicated 4x.

Available if GMX_SIMD_HAVE_4NSIMD_UTIL_FLOAT is 1. N is GMX_SIMD_FLOAT_WIDTH/4. Duplicated values are contigous and different values are 4 positions in SIMD apart.

static SimdFloat gmx_simdcall gmx::maskAdd ( SimdFloat  a,
SimdFloat  b,
SimdFBool  m 
)
inlinestatic

Add two float SIMD variables, masked version.

Parameters
aterm1
bterm2
mmask
Returns
a+b where mask is true, a otherwise.
static SimdDouble gmx_simdcall gmx::maskAdd ( SimdDouble  a,
SimdDouble  b,
SimdDBool  m 
)
inlinestatic

Add two double SIMD variables, masked version.

Parameters
aterm1
bterm2
mmask
Returns
a+b where mask is true, 0.0 otherwise.
static SimdFloat gmx_simdcall gmx::maskzFma ( SimdFloat  a,
SimdFloat  b,
SimdFloat  c,
SimdFBool  m 
)
inlinestatic

SIMD float fused multiply-add, masked version.

Parameters
afactor1
bfactor2
cterm
mmask
Returns
a*b+c where mask is true, 0.0 otherwise.
static SimdDouble gmx_simdcall gmx::maskzFma ( SimdDouble  a,
SimdDouble  b,
SimdDouble  c,
SimdDBool  m 
)
inlinestatic

SIMD double fused multiply-add, masked version.

Parameters
afactor1
bfactor2
cterm
mmask
Returns
a*b+c where mask is true, 0.0 otherwise.
static SimdFloat gmx_simdcall gmx::maskzMul ( SimdFloat  a,
SimdFloat  b,
SimdFBool  m 
)
inlinestatic

Multiply two float SIMD variables, masked version.

Parameters
afactor1
bfactor2
mmask
Returns
a*b where mask is true, 0.0 otherwise.
static SimdDouble gmx_simdcall gmx::maskzMul ( SimdDouble  a,
SimdDouble  b,
SimdDBool  m 
)
inlinestatic

Multiply two double SIMD variables, masked version.

Parameters
afactor1
bfactor2
mmask
Returns
a*b where mask is true, 0.0 otherwise.
static SimdFloat gmx_simdcall gmx::maskzRcp ( SimdFloat  x,
SimdFBool  m 
)
inlinestatic

SIMD float 1.0/x lookup, masked version.

This is a low-level instruction that should only be called from routines implementing the reciprocal in simd_math.h.

Parameters
xArgument, x>0 for entries where mask is true.
mMask
Returns
Approximation of 1/x, accuracy is GMX_SIMD_RCP_BITS. The result for masked-out entries will be 0.0.
static SimdDouble gmx_simdcall gmx::maskzRcp ( SimdDouble  x,
SimdDBool  m 
)
inlinestatic

SIMD double 1.0/x lookup, masked version.

This is a low-level instruction that should only be called from routines implementing the reciprocal in simd_math.h.

Parameters
xArgument, x>0 for entries where mask is true.
mMask
Returns
Approximation of 1/x, accuracy is GMX_SIMD_RCP_BITS. The result for masked-out entries will be 0.0.
static SimdFloat gmx_simdcall gmx::maskzRsqrt ( SimdFloat  x,
SimdFBool  m 
)
inlinestatic

SIMD float 1.0/sqrt(x) lookup, masked version.

This is a low-level instruction that should only be called from routines implementing the inverse square root in simd_math.h.

Parameters
xArgument, x>0 for entries where mask is true.
mMask
Returns
Approximation of 1/sqrt(x), accuracy is GMX_SIMD_RSQRT_BITS. The result for masked-out entries will be 0.0.
static SimdDouble gmx_simdcall gmx::maskzRsqrt ( SimdDouble  x,
SimdDBool  m 
)
inlinestatic

SIMD double 1.0/sqrt(x) lookup, masked version.

This is a low-level instruction that should only be called from routines implementing the inverse square root in simd_math.h.

Parameters
xArgument, x>0 for entries where mask is true.
mMask
Returns
Approximation of 1/sqrt(x), accuracy is GMX_SIMD_RSQRT_BITS. The result for masked-out entries will be 0.0.
static Simd4Float gmx_simdcall gmx::max ( Simd4Float  a,
Simd4Float  b 
)
inlinestatic

Set each SIMD4 element to the largest from two variables.

Parameters
aAny floating-point value
bAny floating-point value
Returns
max(a,b) for each element.
static Simd4Double gmx_simdcall gmx::max ( Simd4Double  a,
Simd4Double  b 
)
inlinestatic

Set each SIMD4 element to the largest from two variables.

Parameters
aAny floating-point value
bAny floating-point value
Returns
max(a,b) for each element.
static SimdFloat gmx_simdcall gmx::max ( SimdFloat  a,
SimdFloat  b 
)
inlinestatic

Set each SIMD float element to the largest from two variables.

Parameters
aAny floating-point value
bAny floating-point value
Returns
max(a,b) for each element.
static SimdDouble gmx_simdcall gmx::max ( SimdDouble  a,
SimdDouble  b 
)
inlinestatic

Set each SIMD double element to the largest from two variables.

Parameters
aAny floating-point value
bAny floating-point value
Returns
max(a,b) for each element.
static Simd4Float gmx_simdcall gmx::min ( Simd4Float  a,
Simd4Float  b 
)
inlinestatic

Set each SIMD4 element to the largest from two variables.

Parameters
aAny floating-point value
bAny floating-point value
Returns
max(a,b) for each element.
static Simd4Double gmx_simdcall gmx::min ( Simd4Double  a,
Simd4Double  b 
)
inlinestatic

Set each SIMD4 element to the largest from two variables.

Parameters
aAny floating-point value
bAny floating-point value
Returns
max(a,b) for each element.
static SimdFloat gmx_simdcall gmx::min ( SimdFloat  a,
SimdFloat  b 
)
inlinestatic

Set each SIMD float element to the smallest from two variables.

Parameters
aAny floating-point value
bAny floating-point value
Returns
min(a,b) for each element.
static SimdDouble gmx_simdcall gmx::min ( SimdDouble  a,
SimdDouble  b 
)
inlinestatic

Set each SIMD double element to the smallest from two variables.

Parameters
aAny floating-point value
bAny floating-point value
Returns
min(a,b) for each element.
static Simd4FBool gmx_simdcall gmx::operator!= ( Simd4Float  a,
Simd4Float  b 
)
inlinestatic

a!=b for SIMD4 float

Parameters
avalue1
bvalue2
Returns
Each element of the boolean will be set to true if a!=b.
static Simd4DBool gmx_simdcall gmx::operator!= ( Simd4Double  a,
Simd4Double  b 
)
inlinestatic

a!=b for SIMD4 double

Parameters
avalue1
bvalue2
Returns
Each element of the boolean will be set to true if a!=b.
static SimdFBool gmx_simdcall gmx::operator!= ( SimdFloat  a,
SimdFloat  b 
)
inlinestatic

SIMD a!=b for single SIMD.

Parameters
avalue1
bvalue2
Returns
Each element of the boolean will be set to true if a!=b.

Beware that exact floating-point comparisons are difficult.

static SimdDBool gmx_simdcall gmx::operator!= ( SimdDouble  a,
SimdDouble  b 
)
inlinestatic

SIMD a!=b for double SIMD.

Parameters
avalue1
bvalue2
Returns
Each element of the boolean will be set to true if a!=b.

Beware that exact floating-point comparisons are difficult.

static Simd4Float gmx_simdcall gmx::operator& ( Simd4Float  a,
Simd4Float  b 
)
inlinestatic

Bitwise and for two SIMD4 float variables.

Supported if GMX_SIMD_HAVE_LOGICAL is 1.

Parameters
adata1
bdata2
Returns
data1 & data2
static Simd4Double gmx_simdcall gmx::operator& ( Simd4Double  a,
Simd4Double  b 
)
inlinestatic

Bitwise and for two SIMD4 double variables.

Supported if GMX_SIMD_HAVE_LOGICAL is 1.

Parameters
adata1
bdata2
Returns
data1 & data2
static SimdFloat gmx_simdcall gmx::operator& ( SimdFloat  a,
SimdFloat  b 
)
inlinestatic

Bitwise and for two SIMD float variables.

Supported if GMX_SIMD_HAVE_LOGICAL is 1.

Parameters
adata1
bdata2
Returns
data1 & data2
static SimdDouble gmx_simdcall gmx::operator& ( SimdDouble  a,
SimdDouble  b 
)
inlinestatic

Bitwise and for two SIMD double variables.

Supported if GMX_SIMD_HAVE_LOGICAL is 1.

Parameters
adata1
bdata2
Returns
data1 & data2
static SimdFInt32 gmx_simdcall gmx::operator& ( SimdFInt32  a,
SimdFInt32  b 
)
inlinestatic

Integer SIMD bitwise and.

Available if GMX_SIMD_HAVE_FINT32_LOGICAL is 1.

Note
You can not use this operation directly to select based on a boolean SIMD variable, since booleans are separate from integer SIMD. If that is what you need, have a look at gmx::selectByMask instead.
Parameters
afirst integer SIMD
bsecond integer SIMD
Returns
a & b (bitwise and)
static SimdDInt32 gmx_simdcall gmx::operator& ( SimdDInt32  a,
SimdDInt32  b 
)
inlinestatic

Integer SIMD bitwise and.

Available if GMX_SIMD_HAVE_DINT32_LOGICAL is 1.

Note
You can not use this operation directly to select based on a boolean SIMD variable, since booleans are separate from integer SIMD. If that is what you need, have a look at gmx::selectByMask instead.
Parameters
afirst integer SIMD
bsecond integer SIMD
Returns
a & b (bitwise and)
static Simd4FBool gmx_simdcall gmx::operator&& ( Simd4FBool  a,
Simd4FBool  b 
)
inlinestatic

Logical and on single precision SIMD4 booleans.

Parameters
alogical vars 1
blogical vars 2
Returns
For each element, the result boolean is true if a & b are true.
Note
This is not necessarily a bitwise operation - the storage format of booleans is implementation-dependent.
static Simd4DBool gmx_simdcall gmx::operator&& ( Simd4DBool  a,
Simd4DBool  b 
)
inlinestatic

Logical and on single precision SIMD4 booleans.

Parameters
alogical vars 1
blogical vars 2
Returns
For each element, the result boolean is true if a & b are true.
Note
This is not necessarily a bitwise operation - the storage format of booleans is implementation-dependent.
static SimdFBool gmx_simdcall gmx::operator&& ( SimdFBool  a,
SimdFBool  b 
)
inlinestatic

Logical and on single precision SIMD booleans.

Parameters
alogical vars 1
blogical vars 2
Returns
For each element, the result boolean is true if a & b are true.
Note
This is not necessarily a bitwise operation - the storage format of booleans is implementation-dependent.
static SimdDBool gmx_simdcall gmx::operator&& ( SimdDBool  a,
SimdDBool  b 
)
inlinestatic

Logical and on double precision SIMD booleans.

Parameters
alogical vars 1
blogical vars 2
Returns
For each element, the result boolean is true if a & b are true.
Note
This is not necessarily a bitwise operation - the storage format of booleans is implementation-dependent.
static SimdFIBool gmx_simdcall gmx::operator&& ( SimdFIBool  a,
SimdFIBool  b 
)
inlinestatic

Logical AND on SimdFIBool.

Available if GMX_SIMD_HAVE_FINT32_ARITHMETICS is 1.

Parameters
aSIMD boolean 1
bSIMD boolean 2
Returns
True for elements where both a and b are true.
static SimdDIBool gmx_simdcall gmx::operator&& ( SimdDIBool  a,
SimdDIBool  b 
)
inlinestatic

Logical AND on SimdDIBool.

Available if GMX_SIMD_HAVE_DINT32_ARITHMETICS is 1.

Parameters
aSIMD boolean 1
bSIMD boolean 2
Returns
True for elements where both a and b are true.
static Simd4Float gmx_simdcall gmx::operator* ( Simd4Float  a,
Simd4Float  b 
)
inlinestatic

Multiply two SIMD4 variables.

Parameters
afactor1
bfactor2
Returns
a*b.
static Simd4Double gmx_simdcall gmx::operator* ( Simd4Double  a,
Simd4Double  b 
)
inlinestatic

Multiply two SIMD4 variables.

Parameters
afactor1
bfactor2
Returns
a*b.
static SimdFloat gmx_simdcall gmx::operator* ( SimdFloat  a,
SimdFloat  b 
)
inlinestatic

Multiply two float SIMD variables.

Parameters
afactor1
bfactor2
Returns
a*b.
static SimdDouble gmx_simdcall gmx::operator* ( SimdDouble  a,
SimdDouble  b 
)
inlinestatic

Multiply two double SIMD variables.

Parameters
afactor1
bfactor2
Returns
a*b.
static SimdFInt32 gmx_simdcall gmx::operator* ( SimdFInt32  a,
SimdFInt32  b 
)
inlinestatic

Multiply SIMD integers.

This routine is only available if GMX_SIMD_HAVE_FINT32_ARITHMETICS (single) or GMX_SIMD_HAVE_DINT32_ARITHMETICS (double) is 1.

Parameters
afactor1
bfactor2
Returns
a*b.
Note
Only the low 32 bits are retained, so this can overflow.
static SimdDInt32 gmx_simdcall gmx::operator* ( SimdDInt32  a,
SimdDInt32  b 
)
inlinestatic

Multiply SIMD integers.

Available if GMX_SIMD_HAVE_DINT32_ARITHMETICS is 1.

Parameters
afactor1
bfactor2
Returns
a*b.
Note
Only the low 32 bits are retained, so this can overflow.
static Simd4Double gmx_simdcall gmx::operator+ ( Simd4Double  a,
Simd4Double  b 
)
inlinestatic

Add two double SIMD4 variables.

Parameters
aterm1
bterm2
Returns
a+b
static Simd4Float gmx_simdcall gmx::operator+ ( Simd4Float  a,
Simd4Float  b 
)
inlinestatic

Add two float SIMD4 variables.

Parameters
aterm1
bterm2
Returns
a+b
static SimdFloat gmx_simdcall gmx::operator+ ( SimdFloat  a,
SimdFloat  b 
)
inlinestatic

Add two float SIMD variables.

Parameters
aterm1
bterm2
Returns
a+b
static SimdDouble gmx_simdcall gmx::operator+ ( SimdDouble  a,
SimdDouble  b 
)
inlinestatic

Add two double SIMD variables.

Parameters
aterm1
bterm2
Returns
a+b
static SimdFInt32 gmx_simdcall gmx::operator+ ( SimdFInt32  a,
SimdFInt32  b 
)
inlinestatic

Add SIMD integers.

This routine is only available if GMX_SIMD_HAVE_FINT32_ARITHMETICS (single) or GMX_SIMD_HAVE_DINT32_ARITHMETICS (double) is 1.

Parameters
aterm1
bterm2
Returns
a+b
static SimdDInt32 gmx_simdcall gmx::operator+ ( SimdDInt32  a,
SimdDInt32  b 
)
inlinestatic

Add SIMD integers.

Available if GMX_SIMD_HAVE_DINT32_ARITHMETICS is 1.

Parameters
aterm1
bterm2
Returns
a+b
static Simd4Double gmx_simdcall gmx::operator- ( Simd4Double  a,
Simd4Double  b 
)
inlinestatic

Subtract two SIMD4 variables.

Parameters
aterm1
bterm2
Returns
a-b
static Simd4Float gmx_simdcall gmx::operator- ( Simd4Float  a,
Simd4Float  b 
)
inlinestatic

Subtract two SIMD4 variables.

Parameters
aterm1
bterm2
Returns
a-b
static Simd4Float gmx_simdcall gmx::operator- ( Simd4Float  a)
inlinestatic

SIMD4 floating-point negate.

Parameters
aSIMD4 floating-point value
Returns
-a
static Simd4Double gmx_simdcall gmx::operator- ( Simd4Double  a)
inlinestatic

SIMD4 floating-point negate.

Parameters
aSIMD4 floating-point value
Returns
-a
static SimdFloat gmx_simdcall gmx::operator- ( SimdFloat  a,
SimdFloat  b 
)
inlinestatic

Subtract two float SIMD variables.

Parameters
aterm1
bterm2
Returns
a-b
static SimdDouble gmx_simdcall gmx::operator- ( SimdDouble  a,
SimdDouble  b 
)
inlinestatic

Subtract two double SIMD variables.

Parameters
aterm1
bterm2
Returns
a-b
static SimdFloat gmx_simdcall gmx::operator- ( SimdFloat  a)
inlinestatic

SIMD single precision negate.

Parameters
aSIMD double precision value
Returns
-a
static SimdDouble gmx_simdcall gmx::operator- ( SimdDouble  a)
inlinestatic

SIMD double precision negate.

Parameters
aSIMD double precision value
Returns
-a
static SimdFInt32 gmx_simdcall gmx::operator- ( SimdFInt32  a,
SimdFInt32  b 
)
inlinestatic

Subtract SIMD integers.

This routine is only available if GMX_SIMD_HAVE_FINT32_ARITHMETICS (single) or GMX_SIMD_HAVE_DINT32_ARITHMETICS (double) is 1.

Parameters
aterm1
bterm2
Returns
a-b
static SimdDInt32 gmx_simdcall gmx::operator- ( SimdDInt32  a,
SimdDInt32  b 
)
inlinestatic

Subtract SIMD integers.

Available if GMX_SIMD_HAVE_DINT32_ARITHMETICS is 1.

Parameters
aterm1
bterm2
Returns
a-b
static Simd4FBool gmx_simdcall gmx::operator< ( Simd4Float  a,
Simd4Float  b 
)
inlinestatic

a<b for SIMD4 float

Parameters
avalue1
bvalue2
Returns
Each element of the boolean will be set to true if a<b.
static Simd4DBool gmx_simdcall gmx::operator< ( Simd4Double  a,
Simd4Double  b 
)
inlinestatic

a<b for SIMD4 double

Parameters
avalue1
bvalue2
Returns
Each element of the boolean will be set to true if a<b.
static SimdFBool gmx_simdcall gmx::operator< ( SimdFloat  a,
SimdFloat  b 
)
inlinestatic

SIMD a<b for single SIMD.

Parameters
avalue1
bvalue2
Returns
Each element of the boolean will be set to true if a<b.
static SimdDBool gmx_simdcall gmx::operator< ( SimdDouble  a,
SimdDouble  b 
)
inlinestatic

SIMD a<b for double SIMD.

Parameters
avalue1
bvalue2
Returns
Each element of the boolean will be set to true if a<b.
static SimdFIBool gmx_simdcall gmx::operator< ( SimdFInt32  a,
SimdFInt32  b 
)
inlinestatic

Less-than comparison of two SIMD integers corresponding to float values.

Available if GMX_SIMD_HAVE_FINT32_ARITHMETICS is 1.

Parameters
aSIMD integer1
bSIMD integer2
Returns
SIMD integer boolean with true for elements where a<b
static SimdDIBool gmx_simdcall gmx::operator< ( SimdDInt32  a,
SimdDInt32  b 
)
inlinestatic

Less-than comparison of two SIMD integers corresponding to double values.

Available if GMX_SIMD_HAVE_DINT32_ARITHMETICS is 1.

Parameters
aSIMD integer1
bSIMD integer2
Returns
SIMD integer boolean with true for elements where a<b
static Simd4FBool gmx_simdcall gmx::operator<= ( Simd4Float  a,
Simd4Float  b 
)
inlinestatic

a<=b for SIMD4 float.

Parameters
avalue1
bvalue2
Returns
Each element of the boolean will be set to true if a<=b.
static Simd4DBool gmx_simdcall gmx::operator<= ( Simd4Double  a,
Simd4Double  b 
)
inlinestatic

a<=b for SIMD4 double.

Parameters
avalue1
bvalue2
Returns
Each element of the boolean will be set to true if a<=b.
static SimdFBool gmx_simdcall gmx::operator<= ( SimdFloat  a,
SimdFloat  b 
)
inlinestatic

SIMD a<=b for single SIMD.

Parameters
avalue1
bvalue2
Returns
Each element of the boolean will be set to true if a<=b.
static SimdDBool gmx_simdcall gmx::operator<= ( SimdDouble  a,
SimdDouble  b 
)
inlinestatic

SIMD a<=b for double SIMD.

Parameters
avalue1
bvalue2
Returns
Each element of the boolean will be set to true if a<=b.
static Simd4FBool gmx_simdcall gmx::operator== ( Simd4Float  a,
Simd4Float  b 
)
inlinestatic

a==b for SIMD4 float

Parameters
avalue1
bvalue2
Returns
Each element of the boolean will be set to true if a==b.
static Simd4DBool gmx_simdcall gmx::operator== ( Simd4Double  a,
Simd4Double  b 
)
inlinestatic

a==b for SIMD4 double

Parameters
avalue1
bvalue2
Returns
Each element of the boolean will be set to true if a==b.
static SimdFBool gmx_simdcall gmx::operator== ( SimdFloat  a,
SimdFloat  b 
)
inlinestatic

SIMD a==b for single SIMD.

Parameters
avalue1
bvalue2
Returns
Each element of the boolean will be set to true if a==b.

Beware that exact floating-point comparisons are difficult.

static SimdDBool gmx_simdcall gmx::operator== ( SimdDouble  a,
SimdDouble  b 
)
inlinestatic

SIMD a==b for double SIMD.

Parameters
avalue1
bvalue2
Returns
Each element of the boolean will be set to true if a==b.

Beware that exact floating-point comparisons are difficult.

static SimdFIBool gmx_simdcall gmx::operator== ( SimdFInt32  a,
SimdFInt32  b 
)
inlinestatic

Equality comparison of two integers corresponding to float values.

Available if GMX_SIMD_HAVE_FINT32_ARITHMETICS is 1.

Parameters
aSIMD integer1
bSIMD integer2
Returns
SIMD integer boolean with true for elements where a==b
static SimdDIBool gmx_simdcall gmx::operator== ( SimdDInt32  a,
SimdDInt32  b 
)
inlinestatic

Equality comparison of two integers corresponding to double values.

Available if GMX_SIMD_HAVE_DINT32_ARITHMETICS is 1.

Parameters
aSIMD integer1
bSIMD integer2
Returns
SIMD integer boolean with true for elements where a==b
static Simd4Float gmx_simdcall gmx::operator^ ( Simd4Float  a,
Simd4Float  b 
)
inlinestatic

Bitwise xor for two SIMD4 float variables.

Available if GMX_SIMD_HAVE_LOGICAL is 1.

Parameters
adata1
bdata2
Returns
data1 ^ data2
static Simd4Double gmx_simdcall gmx::operator^ ( Simd4Double  a,
Simd4Double  b 
)
inlinestatic

Bitwise xor for two SIMD4 double variables.

Available if GMX_SIMD_HAVE_LOGICAL is 1.

Parameters
adata1
bdata2
Returns
data1 ^ data2
static SimdFloat gmx_simdcall gmx::operator^ ( SimdFloat  a,
SimdFloat  b 
)
inlinestatic

Bitwise xor for SIMD float.

Available if GMX_SIMD_HAVE_LOGICAL is 1.

Parameters
adata1
bdata2
Returns
data1 ^ data2
static SimdDouble gmx_simdcall gmx::operator^ ( SimdDouble  a,
SimdDouble  b 
)
inlinestatic

Bitwise xor for SIMD double.

Available if GMX_SIMD_HAVE_LOGICAL is 1.

Parameters
adata1
bdata2
Returns
data1 ^ data2
static SimdFInt32 gmx_simdcall gmx::operator^ ( SimdFInt32  a,
SimdFInt32  b 
)
inlinestatic

Integer SIMD bitwise xor.

Available if GMX_SIMD_HAVE_FINT32_LOGICAL is 1.

Parameters
afirst integer SIMD
bsecond integer SIMD
Returns
a ^ b (bitwise xor)
static SimdDInt32 gmx_simdcall gmx::operator^ ( SimdDInt32  a,
SimdDInt32  b 
)
inlinestatic

Integer SIMD bitwise xor.

Available if GMX_SIMD_HAVE_DINT32_LOGICAL is 1.

Parameters
afirst integer SIMD
bsecond integer SIMD
Returns
a ^ b (bitwise xor)
static Simd4Double gmx_simdcall gmx::operator| ( Simd4Double  a,
Simd4Double  b 
)
inlinestatic

Bitwise or for two SIMD4 doubles.

Available if GMX_SIMD_HAVE_LOGICAL is 1.

Parameters
adata1
bdata2
Returns
data1 | data2
static Simd4Float gmx_simdcall gmx::operator| ( Simd4Float  a,
Simd4Float  b 
)
inlinestatic

Bitwise or for two SIMD4 floats.

Available if GMX_SIMD_HAVE_LOGICAL is 1.

Parameters
adata1
bdata2
Returns
data1 | data2
static SimdFloat gmx_simdcall gmx::operator| ( SimdFloat  a,
SimdFloat  b 
)
inlinestatic

Bitwise or for SIMD float.

Available if GMX_SIMD_HAVE_LOGICAL is 1.

Parameters
adata1
bdata2
Returns
data1 | data2
static SimdDouble gmx_simdcall gmx::operator| ( SimdDouble  a,
SimdDouble  b 
)
inlinestatic

Bitwise or for SIMD double.

Available if GMX_SIMD_HAVE_LOGICAL is 1.

Parameters
adata1
bdata2
Returns
data1 | data2
static SimdFInt32 gmx_simdcall gmx::operator| ( SimdFInt32  a,
SimdFInt32  b 
)
inlinestatic

Integer SIMD bitwise or.

Available if GMX_SIMD_HAVE_FINT32_LOGICAL is 1.

Parameters
afirst integer SIMD
bsecond integer SIMD
Returns
a | b (bitwise or)
static SimdDInt32 gmx_simdcall gmx::operator| ( SimdDInt32  a,
SimdDInt32  b 
)
inlinestatic

Integer SIMD bitwise or.

Available if GMX_SIMD_HAVE_DINT32_LOGICAL is 1.

Parameters
afirst integer SIMD
bsecond integer SIMD
Returns
a | b (bitwise or)
static Simd4FBool gmx_simdcall gmx::operator|| ( Simd4FBool  a,
Simd4FBool  b 
)
inlinestatic

Logical or on single precision SIMD4 booleans.

Parameters
alogical vars 1
blogical vars 2
Returns
For each element, the result boolean is true if a or b is true.

Note that this is not necessarily a bitwise operation - the storage format of booleans is implementation-dependent.

static Simd4DBool gmx_simdcall gmx::operator|| ( Simd4DBool  a,
Simd4DBool  b 
)
inlinestatic

Logical or on single precision SIMD4 booleans.

Parameters
alogical vars 1
blogical vars 2
Returns
For each element, the result boolean is true if a or b is true.

Note that this is not necessarily a bitwise operation - the storage format of booleans is implementation-dependent.

static SimdFBool gmx_simdcall gmx::operator|| ( SimdFBool  a,
SimdFBool  b 
)
inlinestatic

Logical or on single precision SIMD booleans.

Parameters
alogical vars 1
blogical vars 2
Returns
For each element, the result boolean is true if a or b is true.

Note that this is not necessarily a bitwise operation - the storage format of booleans is implementation-dependent.

\

static SimdDBool gmx_simdcall gmx::operator|| ( SimdDBool  a,
SimdDBool  b 
)
inlinestatic

Logical or on double precision SIMD booleans.

Parameters
alogical vars 1
blogical vars 2
Returns
For each element, the result boolean is true if a or b is true.

Note that this is not necessarily a bitwise operation - the storage format of booleans is implementation-dependent.

\

static SimdFIBool gmx_simdcall gmx::operator|| ( SimdFIBool  a,
SimdFIBool  b 
)
inlinestatic

Logical OR on SimdFIBool.

Available if GMX_SIMD_HAVE_FINT32_ARITHMETICS is 1.

Parameters
aSIMD boolean 1
bSIMD boolean 2
Returns
True for elements where both a and b are true.
static SimdDIBool gmx_simdcall gmx::operator|| ( SimdDIBool  a,
SimdDIBool  b 
)
inlinestatic

Logical OR on SimdDIBool.

Available if GMX_SIMD_HAVE_DINT32_ARITHMETICS is 1.

Parameters
aSIMD boolean 1
bSIMD boolean 2
Returns
True for elements where both a and b are true.
static SimdFloat gmx_simdcall gmx::rcp ( SimdFloat  x)
inlinestatic

SIMD float 1.0/x lookup.

This is a low-level instruction that should only be called from routines implementing the reciprocal in simd_math.h.

Parameters
xArgument, x!=0
Returns
Approximation of 1/x, accuracy is GMX_SIMD_RCP_BITS.
static SimdDouble gmx_simdcall gmx::rcp ( SimdDouble  x)
inlinestatic

SIMD double 1.0/x lookup.

This is a low-level instruction that should only be called from routines implementing the reciprocal in simd_math.h.

Parameters
xArgument, x!=0
Returns
Approximation of 1/x, accuracy is GMX_SIMD_RCP_BITS.
static float gmx_simdcall gmx::reduce ( Simd4Float  a)
inlinestatic

Return sum of all elements in SIMD4 float variable.

Parameters
aSIMD4 variable to reduce/sum.
Returns
The sum of all elements in the argument variable.
static double gmx_simdcall gmx::reduce ( Simd4Double  a)
inlinestatic

Return sum of all elements in SIMD4 double variable.

Parameters
aSIMD4 variable to reduce/sum.
Returns
The sum of all elements in the argument variable.
static float gmx_simdcall gmx::reduce ( SimdFloat  a)
inlinestatic

Return sum of all elements in SIMD float variable.

Parameters
aSIMD variable to reduce/sum.
Returns
The sum of all elements in the argument variable.
static double gmx_simdcall gmx::reduce ( SimdDouble  a)
inlinestatic

Return sum of all elements in SIMD double variable.

Parameters
aSIMD variable to reduce/sum.
Returns
The sum of all elements in the argument variable.
static double gmx_simdcall gmx::reduceIncr4ReturnSum ( double *  m,
SimdDouble  v0,
SimdDouble  v1,
SimdDouble  v2,
SimdDouble  v3 
)
inlinestatic

Reduce each of four SIMD doubles, add those values to four consecutive doubles in memory, return sum.

Parameters
mPointer to memory where four doubles should be incremented
v0SIMD variable whose sum should be added to m[0]
v1SIMD variable whose sum should be added to m[1]
v2SIMD variable whose sum should be added to m[2]
v3SIMD variable whose sum should be added to m[3]
Returns
Sum of all elements in the four SIMD variables.

The pointer m must be aligned to the smaller of four elements and the floating-point SIMD width.

Note
This is a special routine intended for the Gromacs nonbonded kernels. It is used in the epilogue of the outer loop, where the variables will contain unrolled forces for one outer-loop-particle each, corresponding to a single coordinate (i.e, say, four x-coordinate force variables). These should be summed and added to the force array in memory. Since we always work with contiguous SIMD-layout , we can use efficient aligned loads/stores. When calculating the virial, we also need the total sum of all forces for each coordinate. This is provided as the return value. For routines that do not need these, this extra code will be optimized away completely if you just ignore the return value (Checked with gcc-4.9.1 and clang-3.6 for AVX).
static float gmx_simdcall gmx::reduceIncr4ReturnSum ( float *  m,
SimdFloat  v0,
SimdFloat  v1,
SimdFloat  v2,
SimdFloat  v3 
)
inlinestatic

Reduce each of four SIMD floats, add those values to four consecutive floats in memory, return sum.

Parameters
mPointer to memory where four floats should be incremented
v0SIMD variable whose sum should be added to m[0]
v1SIMD variable whose sum should be added to m[1]
v2SIMD variable whose sum should be added to m[2]
v3SIMD variable whose sum should be added to m[3]
Returns
Sum of all elements in the four SIMD variables.

The pointer m must be aligned to the smaller of four elements and the floating-point SIMD width.

Note
This is a special routine intended for the Gromacs nonbonded kernels. It is used in the epilogue of the outer loop, where the variables will contain unrolled forces for one outer-loop-particle each, corresponding to a single coordinate (i.e, say, four x-coordinate force variables). These should be summed and added to the force array in memory. Since we always work with contiguous SIMD-layout , we can use efficient aligned loads/stores. When calculating the virial, we also need the total sum of all forces for each coordinate. This is provided as the return value. For routines that do not need these, this extra code will be optimized away completely if you just ignore the return value (Checked with gcc-4.9.1 and clang-3.6 for AVX).
static double gmx_simdcall gmx::reduceIncr4ReturnSumHsimd ( double *  m,
SimdDouble  v0,
SimdDouble  v1 
)
inlinestatic

Reduce the 4 half-SIMD-with doubles in 2 SIMD variables (sum halves), increment four consecutive doubles in memory, return sum.

Parameters
mPointer to memory where the four values should be incremented
v0Variable whose half-SIMD sums should be added to m[0]/m[1], respectively.
v1Variable whose half-SIMD sums should be added to m[2]/m[3], respectively.
Returns
Sum of all elements in the four SIMD variables.

The pointer m must be aligned, but only to the smaller of four elements and the floating-point SIMD width.

Note
This is the half-SIMD-width version of reduceIncr4ReturnSum(). The only difference is that the four half-SIMD inputs needed are present in the low/high halves of the two SIMD arguments.

Available if GMX_SIMD_HAVE_HSIMD_UTIL_DOUBLE is 1.

static float gmx_simdcall gmx::reduceIncr4ReturnSumHsimd ( float *  m,
SimdFloat  v0,
SimdFloat  v1 
)
inlinestatic

Reduce the 4 half-SIMD-with floats in 2 SIMD variables (sum halves), increment four consecutive floats in memory, return sum.

Parameters
mPointer to memory where the four values should be incremented
v0Variable whose half-SIMD sums should be added to m[0]/m[1], respectively.
v1Variable whose half-SIMD sums should be added to m[2]/m[3], respectively.
Returns
Sum of all elements in the four SIMD variables.

The pointer m must be aligned, but only to the smaller of four elements and the floating-point SIMD width.

Note
This is the half-SIMD-width version of reduceIncr4ReturnSum(). The only difference is that the four half-SIMD inputs needed are present in the low/high halves of the two SIMD arguments.

Available if GMX_SIMD_HAVE_HSIMD_UTIL_FLOAT is 1.

static Simd4Float gmx_simdcall gmx::round ( Simd4Float  a)
inlinestatic

SIMD4 Round to nearest integer value (in floating-point format).

Parameters
aAny floating-point value
Returns
The nearest integer, represented in floating-point format.
static Simd4Double gmx_simdcall gmx::round ( Simd4Double  a)
inlinestatic

SIMD4 Round to nearest integer value (in floating-point format).

Parameters
aAny floating-point value
Returns
The nearest integer, represented in floating-point format.
static SimdFloat gmx_simdcall gmx::round ( SimdFloat  a)
inlinestatic

SIMD float round to nearest integer value (in floating-point format).

Parameters
aAny floating-point value
Returns
The nearest integer, represented in floating-point format.
Note
Round mode is implementation defined. The only guarantee is that it is consistent between rounding functions (round, cvtR2I).
static SimdDouble gmx_simdcall gmx::round ( SimdDouble  a)
inlinestatic

SIMD double round to nearest integer value (in floating-point format).

Parameters
aAny floating-point value
Returns
The nearest integer, represented in floating-point format.
Note
Round mode is implementation defined. The only guarantee is that it is consistent between rounding functions (round, cvtR2I).
static Simd4Double gmx_simdcall gmx::rsqrt ( Simd4Double  x)
inlinestatic

SIMD4 1.0/sqrt(x) lookup.

This is a low-level instruction that should only be called from routines implementing the inverse square root in simd_math.h.

Parameters
xArgument, x>0
Returns
Approximation of 1/sqrt(x), accuracy is GMX_SIMD_RSQRT_BITS.
static Simd4Float gmx_simdcall gmx::rsqrt ( Simd4Float  x)
inlinestatic

SIMD4 1.0/sqrt(x) lookup.

This is a low-level instruction that should only be called from routines implementing the inverse square root in simd_math.h.

Parameters
xArgument, x>0
Returns
Approximation of 1/sqrt(x), accuracy is GMX_SIMD_RSQRT_BITS.
static SimdFloat gmx_simdcall gmx::rsqrt ( SimdFloat  x)
inlinestatic

SIMD float 1.0/sqrt(x) lookup.

This is a low-level instruction that should only be called from routines implementing the inverse square root in simd_math.h.

Parameters
xArgument, x>0
Returns
Approximation of 1/sqrt(x), accuracy is GMX_SIMD_RSQRT_BITS.
static SimdDouble gmx_simdcall gmx::rsqrt ( SimdDouble  x)
inlinestatic

double SIMD 1.0/sqrt(x) lookup.

This is a low-level instruction that should only be called from routines implementing the inverse square root in simd_math.h.

Parameters
xArgument, x>0
Returns
Approximation of 1/sqrt(x), accuracy is GMX_SIMD_RSQRT_BITS.
static Simd4Float gmx_simdcall gmx::selectByMask ( Simd4Float  a,
Simd4FBool  mask 
)
inlinestatic

Select from single precision SIMD4 variable where boolean is true.

Parameters
aFloating-point variable to select from
maskBoolean selector
Returns
For each element, a is selected for true, 0 for false.
static Simd4Double gmx_simdcall gmx::selectByMask ( Simd4Double  a,
Simd4DBool  mask 
)
inlinestatic

Select from single precision SIMD4 variable where boolean is true.

Parameters
aFloating-point variable to select from
maskBoolean selector
Returns
For each element, a is selected for true, 0 for false.
static SimdFloat gmx_simdcall gmx::selectByMask ( SimdFloat  a,
SimdFBool  mask 
)
inlinestatic

Select from single precision SIMD variable where boolean is true.

Parameters
aFloating-point variable to select from
maskBoolean selector
Returns
For each element, a is selected for true, 0 for false.
static SimdDouble gmx_simdcall gmx::selectByMask ( SimdDouble  a,
SimdDBool  mask 
)
inlinestatic

Select from double precision SIMD variable where boolean is true.

Parameters
aFloating-point variable to select from
maskBoolean selector
Returns
For each element, a is selected for true, 0 for false.
static SimdFInt32 gmx_simdcall gmx::selectByMask ( SimdFInt32  a,
SimdFIBool  mask 
)
inlinestatic

Select from gmx::SimdFInt32 variable where boolean is true.

Available if GMX_SIMD_HAVE_FINT32_ARITHMETICS is 1.

Parameters
aSIMD integer to select from
maskBoolean selector
Returns
Elements from a where sel is true, 0 otherwise.
static SimdDInt32 gmx_simdcall gmx::selectByMask ( SimdDInt32  a,
SimdDIBool  mask 
)
inlinestatic

Select from gmx::SimdDInt32 variable where boolean is true.

Available if GMX_SIMD_HAVE_DINT32_ARITHMETICS is 1.

Parameters
aSIMD integer to select from
maskBoolean selector
Returns
Elements from a where sel is true, 0 otherwise.
static Simd4Float gmx_simdcall gmx::selectByNotMask ( Simd4Float  a,
Simd4FBool  mask 
)
inlinestatic

Select from single precision SIMD4 variable where boolean is false.

Parameters
aFloating-point variable to select from
maskBoolean selector
Returns
For each element, a is selected for false, 0 for true (sic).
static Simd4Double gmx_simdcall gmx::selectByNotMask ( Simd4Double  a,
Simd4DBool  mask 
)
inlinestatic

Select from single precision SIMD4 variable where boolean is false.

Parameters
aFloating-point variable to select from
maskBoolean selector
Returns
For each element, a is selected for false, 0 for true (sic).
static SimdFloat gmx_simdcall gmx::selectByNotMask ( SimdFloat  a,
SimdFBool  mask 
)
inlinestatic

Select from single precision SIMD variable where boolean is false.

Parameters
aFloating-point variable to select from
maskBoolean selector
Returns
For each element, a is selected for false, 0 for true (sic).
static SimdDouble gmx_simdcall gmx::selectByNotMask ( SimdDouble  a,
SimdDBool  mask 
)
inlinestatic

Select from double precision SIMD variable where boolean is false.

Parameters
aFloating-point variable to select from
maskBoolean selector
Returns
For each element, a is selected for false, 0 for true (sic).
static SimdFInt32 gmx_simdcall gmx::selectByNotMask ( SimdFInt32  a,
SimdFIBool  mask 
)
inlinestatic

Select from gmx::SimdFInt32 variable where boolean is false.

Available if GMX_SIMD_HAVE_FINT32_ARITHMETICS is 1.

Parameters
aSIMD integer to select from
maskBoolean selector
Returns
Elements from a where sel is false, 0 otherwise (sic).
static SimdDInt32 gmx_simdcall gmx::selectByNotMask ( SimdDInt32  a,
SimdDIBool  mask 
)
inlinestatic

Select from gmx::SimdDInt32 variable where boolean is false.

Available if GMX_SIMD_HAVE_DINT32_ARITHMETICS is 1.

Parameters
aSIMD integer to select from
maskBoolean selector
Returns
Elements from a where sel is false, 0 otherwise (sic).
static SimdDouble gmx_simdcall gmx::setZeroD ( )
inlinestatic

Set all SIMD double variable elements to 0.0.

You should typically just call gmx::setZero(), which uses proxy objects internally to handle all types rather than adding the suffix used here.

Returns
SIMD 0.0
static SimdDInt32 gmx_simdcall gmx::setZeroDI ( )
inlinestatic

Set all SIMD (double) integer variable elements to 0.

You should typically just call gmx::setZero(), which uses proxy objects internally to handle all types rather than adding the suffix used here.

Returns
SIMD 0
static SimdFloat gmx_simdcall gmx::setZeroF ( )
inlinestatic

Set all SIMD float variable elements to 0.0.

You should typically just call gmx::setZero(), which uses proxy objects internally to handle all types rather than adding the suffix used here.

Returns
SIMD 0.0F
static SimdFInt32 gmx_simdcall gmx::setZeroFI ( )
inlinestatic

Set all SIMD (float) integer variable elements to 0.

You should typically just call gmx::setZero(), which uses proxy objects internally to handle all types rather than adding the suffix used here.

Returns
SIMD 0
static Simd4Double gmx_simdcall gmx::simd4SetZeroD ( )
inlinestatic

Set all SIMD4 double elements to 0.

You should typically just call gmx::setZero(), which uses proxy objects internally to handle all types rather than adding the suffix used here.

Returns
SIMD4 0.0
static Simd4Float gmx_simdcall gmx::simd4SetZeroF ( )
inlinestatic

Set all SIMD4 float elements to 0.

You should typically just call gmx::setZero(), which uses proxy objects internally to handle all types rather than adding the suffix used here.

Returns
SIMD4 0.0
static SimdFloat gmx_simdcall gmx::simdLoad ( const float *  m,
SimdFloatTag  = {} 
)
inlinestatic

Load GMX_SIMD_FLOAT_WIDTH float numbers from aligned memory.

Parameters
mPointer to memory aligned to the SIMD width.
Returns
SIMD variable with data loaded.
static SimdDouble gmx_simdcall gmx::simdLoad ( const double *  m,
SimdDoubleTag  = {} 
)
inlinestatic

Load GMX_SIMD_DOUBLE_WIDTH numbers from aligned memory.

Parameters
mPointer to memory aligned to the SIMD width.
Returns
SIMD variable with data loaded.
static SimdFInt32 gmx_simdcall gmx::simdLoad ( const std::int32_t *  m,
SimdFInt32Tag   
)
inlinestatic

Load aligned SIMD integer data, width corresponds to gmx::SimdFloat.

You should typically just call gmx::load(), which uses proxy objects internally to handle all types rather than adding the suffix used here.

Parameters
mPointer to memory, aligned to (float) integer SIMD width.
Returns
SIMD integer variable.
static SimdDInt32 gmx_simdcall gmx::simdLoad ( const std::int32_t *  m,
SimdDInt32Tag   
)
inlinestatic

Load aligned SIMD integer data, width corresponds to gmx::SimdDouble.

You should typically just call gmx::load(), which uses proxy objects internally to handle all types rather than adding the suffix used here.

Parameters
mPointer to memory, aligned to (double) integer SIMD width.
Returns
SIMD integer variable.
static SimdFloat gmx_simdcall gmx::simdLoadU ( const float *  m,
SimdFloatTag  = {} 
)
inlinestatic

Load SIMD float from unaligned memory.

Available if GMX_SIMD_HAVE_LOADU is 1.

Parameters
mPointer to memory, no alignment requirement.
Returns
SIMD variable with data loaded.
static SimdDouble gmx_simdcall gmx::simdLoadU ( const double *  m,
SimdDoubleTag  = {} 
)
inlinestatic

Load SIMD double from unaligned memory.

Available if GMX_SIMD_HAVE_LOADU is 1.

Parameters
mPointer to memory, no alignment requirement.
Returns
SIMD variable with data loaded.
static SimdFInt32 gmx_simdcall gmx::simdLoadU ( const std::int32_t *  m,
SimdFInt32Tag   
)
inlinestatic

Load unaligned integer SIMD data, width corresponds to gmx::SimdFloat.

You should typically just call gmx::loadU(), which uses proxy objects internally to handle all types rather than adding the suffix used here.

Available if GMX_SIMD_HAVE_LOADU is 1.

Parameters
mPointer to memory, no alignment requirements.
Returns
SIMD integer variable.
static SimdDInt32 gmx_simdcall gmx::simdLoadU ( const std::int32_t *  m,
SimdDInt32Tag   
)
inlinestatic

Load unaligned integer SIMD data, width corresponds to gmx::SimdDouble.

You should typically just call gmx::loadU(), which uses proxy objects internally to handle all types rather than adding the suffix used here.

Available if GMX_SIMD_HAVE_LOADU is 1.

Parameters
mPointer to memory, no alignment requirements.
Returns
SIMD integer variable.
static void gmx_simdcall gmx::store ( float *  m,
SimdFloat  a 
)
inlinestatic

Store the contents of SIMD float variable to aligned memory m.

Parameters
[out]mPointer to memory, aligned to SIMD width.
aSIMD variable to store
static void gmx_simdcall gmx::store ( double *  m,
SimdDouble  a 
)
inlinestatic

Store the contents of SIMD double variable to aligned memory m.

Parameters
[out]mPointer to memory, aligned to SIMD width.
aSIMD variable to store
Examples:
template.cpp.
static void gmx_simdcall gmx::store ( std::int32_t *  m,
SimdFInt32  a 
)
inlinestatic

Store aligned SIMD integer data, width corresponds to gmx::SimdFloat.

Parameters
mMemory aligned to (float) integer SIMD width.
aSIMD variable to store.
static void gmx_simdcall gmx::store ( std::int32_t *  m,
SimdDInt32  a 
)
inlinestatic

Store aligned SIMD integer data, width corresponds to gmx::SimdDouble.

Parameters
mMemory aligned to (double) integer SIMD width.
aSIMD (double) integer variable to store.
static void gmx_simdcall gmx::store4 ( double *  m,
Simd4Double  a 
)
inlinestatic

Store the contents of SIMD4 double to aligned memory m.

Parameters
[out]mPointer to memory, aligned to 4 elements.
aSIMD4 variable to store
static void gmx_simdcall gmx::store4 ( float *  m,
Simd4Float  a 
)
inlinestatic

Store the contents of SIMD4 float to aligned memory m.

Parameters
[out]mPointer to memory, aligned to 4 elements.
aSIMD4 variable to store
static void gmx_simdcall gmx::store4U ( float *  m,
Simd4Float  a 
)
inlinestatic

Store SIMD4 float to unaligned memory.

Available if GMX_SIMD_HAVE_STOREU is 1.

Parameters
[out]mPointer to memory, no alignment requirement.
aSIMD4 variable to store.
static void gmx_simdcall gmx::store4U ( double *  m,
Simd4Double  a 
)
inlinestatic

Store SIMD4 double to unaligned memory.

Available if GMX_SIMD_HAVE_STOREU is 1.

Parameters
[out]mPointer to memory, no alignment requirement.
aSIMD4 variable to store.
static void gmx_simdcall gmx::storeDualHsimd ( double *  m0,
double *  m1,
SimdDouble  a 
)
inlinestatic

Store low & high parts of SIMD double to different locations.

Parameters
m0Pointer to memory aligned to half SIMD width.
m1Pointer to memory aligned to half SIMD width.
aSIMD variable. Low half should be stored to m0, high to m1.

Available if GMX_SIMD_HAVE_HSIMD_UTIL_DOUBLE is 1.

static void gmx_simdcall gmx::storeDualHsimd ( float *  m0,
float *  m1,
SimdFloat  a 
)
inlinestatic

Store low & high parts of SIMD float to different locations.

Parameters
m0Pointer to memory aligned to half SIMD width.
m1Pointer to memory aligned to half SIMD width.
aSIMD variable. Low half should be stored to m0, high to m1.

Available if GMX_SIMD_HAVE_HSIMD_UTIL_FLOAT is 1.

static void gmx_simdcall gmx::storeU ( float *  m,
SimdFloat  a 
)
inlinestatic

Store SIMD float to unaligned memory.

Available if GMX_SIMD_HAVE_STOREU is 1.

Parameters
[out]mPointer to memory, no alignment requirement.
aSIMD variable to store.
static void gmx_simdcall gmx::storeU ( double *  m,
SimdDouble  a 
)
inlinestatic

Store SIMD double to unaligned memory.

Available if GMX_SIMD_HAVE_STOREU is 1.

Parameters
[out]mPointer to memory, no alignment requirement.
aSIMD variable to store.
static void gmx_simdcall gmx::storeU ( std::int32_t *  m,
SimdFInt32  a 
)
inlinestatic

Store unaligned SIMD integer data, width corresponds to gmx::SimdFloat.

Available if GMX_SIMD_HAVE_STOREU is 1.

Parameters
mMemory pointer, no alignment requirements.
aSIMD variable to store.
static void gmx_simdcall gmx::storeU ( std::int32_t *  m,
SimdDInt32  a 
)
inlinestatic

Store unaligned SIMD integer data, width corresponds to gmx::SimdDouble.

Available if GMX_SIMD_HAVE_STOREU is 1.

Parameters
mMemory pointer, no alignment requirements.
aSIMD (double) integer variable to store.
static SimdFBool gmx_simdcall gmx::testBits ( SimdFloat  a)
inlinestatic

Return true if any bits are set in the single precision SIMD.

This function is used to handle bitmasks, mainly for exclusions in the inner kernels. Note that it will return true even for -0.0F (sign bit set), so it is not identical to not-equal.

Parameters
avalue
Returns
Each element of the boolean will be true if any bit in a is nonzero.
static SimdDBool gmx_simdcall gmx::testBits ( SimdDouble  a)
inlinestatic

Return true if any bits are set in the single precision SIMD.

This function is used to handle bitmasks, mainly for exclusions in the inner kernels. Note that it will return true even for -0.0 (sign bit set), so it is not identical to not-equal.

Parameters
avalue
Returns
Each element of the boolean will be true if any bit in a is nonzero.
static SimdFIBool gmx_simdcall gmx::testBits ( SimdFInt32  a)
inlinestatic

Check if any bit is set in each element.

Available if GMX_SIMD_HAVE_FINT32_ARITHMETICS is 1.

Parameters
aSIMD integer
Returns
SIMD integer boolean with true for elements where any bit is set
static SimdDIBool gmx_simdcall gmx::testBits ( SimdDInt32  a)
inlinestatic

Check if any bit is set in each element.

Available if GMX_SIMD_HAVE_DINT32_ARITHMETICS is 1.

Parameters
aSIMD integer
Returns
SIMD integer boolean with true for elements where any bit is set
static void gmx_simdcall gmx::transpose ( Simd4Float *  v0,
Simd4Float *  v1,
Simd4Float *  v2,
Simd4Float *  v3 
)
inlinestatic

SIMD4 float transpose.

Parameters
[in,out]v0Row 0 on input, column 0 on output
[in,out]v1Row 1 on input, column 1 on output
[in,out]v2Row 2 on input, column 2 on output
[in,out]v3Row 3 on input, column 3 on output
static void gmx_simdcall gmx::transpose ( Simd4Double *  v0,
Simd4Double *  v1,
Simd4Double *  v2,
Simd4Double *  v3 
)
inlinestatic

SIMD4 double transpose.

Parameters
[in,out]v0Row 0 on input, column 0 on output
[in,out]v1Row 1 on input, column 1 on output
[in,out]v2Row 2 on input, column 2 on output
[in,out]v3Row 3 on input, column 3 on output
template<int align>
static void gmx_simdcall gmx::transposeScatterDecrU ( double *  base,
const std::int32_t  offset[],
SimdDouble  v0,
SimdDouble  v1,
SimdDouble  v2 
)
inlinestatic

Transpose and subtract 3 SIMD doubles to 3 consecutive addresses at GMX_SIMD_DOUBLE_WIDTH offsets.

Template Parameters
alignAlignment of the memory to which we write, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 3 for this routine) the output data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are decremented.
Parameters
[out]basePointer to start of memory.
offsetAligned array with offsets to the start of each triplet.
v01st component, subtracted from base[align*offset[i]]
v12nd component, subtracted from base[align*offset[i]+1]
v23rd component, subtracted from base[align*offset[i]+2]

This function can work with both aligned (better performance) and unaligned memory. When the align parameter is not a power-of-two (align==3 would be normal for packed atomic coordinates) the memory obviously cannot be aligned, and we account for this. However, in the case where align is a power-of-two, we assume the base pointer also has the same alignment, which will enable many platforms to use faster aligned memory load/store operations. An easy way to think of this is that each triplet of data in memory must be aligned to the align parameter you specify when it's a power-of-two.

The offset memory must always be aligned to GMX_SIMD_FINT32_WIDTH, since this enables us to use SIMD loads and gather operations on platforms that support it.

Note
You should NOT scale offsets before calling this routine; it is done internally by using the alignment template parameter instead.
This routine uses a normal array for the offsets, since we typically load the data from memory. On the architectures we have tested this is faster even when a SIMD integer datatype is present.
To improve performance, this function might use full-SIMD-width unaligned load/store, and subtract 0.0 from the extra elements. This means you need to ensure the memory is padded at the end, so we always can load GMX_SIMD_REAL_WIDTH elements starting at the last offset. If you use the Gromacs aligned memory allocation routines this will always be the case.
template<int align>
static void gmx_simdcall gmx::transposeScatterDecrU ( float *  base,
const std::int32_t  offset[],
SimdFloat  v0,
SimdFloat  v1,
SimdFloat  v2 
)
inlinestatic

Transpose and subtract 3 SIMD floats to 3 consecutive addresses at GMX_SIMD_FLOAT_WIDTH offsets.

Template Parameters
alignAlignment of the memory to which we write, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 3 for this routine) the output data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are decremented.
Parameters
[out]basePointer to start of memory.
offsetAligned array with offsets to the start of each triplet.
v01st component, subtracted from base[align*offset[i]]
v12nd component, subtracted from base[align*offset[i]+1]
v23rd component, subtracted from base[align*offset[i]+2]

This function can work with both aligned (better performance) and unaligned memory. When the align parameter is not a power-of-two (align==3 would be normal for packed atomic coordinates) the memory obviously cannot be aligned, and we account for this. However, in the case where align is a power-of-two, we assume the base pointer also has the same alignment, which will enable many platforms to use faster aligned memory load/store operations. An easy way to think of this is that each triplet of data in memory must be aligned to the align parameter you specify when it's a power-of-two.

The offset memory must always be aligned to GMX_SIMD_FINT32_WIDTH, since this enables us to use SIMD loads and gather operations on platforms that support it.

Note
You should NOT scale offsets before calling this routine; it is done internally by using the alignment template parameter instead.
This routine uses a normal array for the offsets, since we typically load the data from memory. On the architectures we have tested this is faster even when a SIMD integer datatype is present.
To improve performance, this function might use full-SIMD-width unaligned load/store, and subtract 0.0 from the extra elements. This means you need to ensure the memory is padded at the end, so we always can load GMX_SIMD_REAL_WIDTH elements starting at the last offset. If you use the Gromacs aligned memory allocation routines this will always be the case.
template<int align>
static void gmx_simdcall gmx::transposeScatterIncrU ( double *  base,
const std::int32_t  offset[],
SimdDouble  v0,
SimdDouble  v1,
SimdDouble  v2 
)
inlinestatic

Transpose and add 3 SIMD doubles to 3 consecutive addresses at GMX_SIMD_DOUBLE_WIDTH offsets.

Template Parameters
alignAlignment of the memory to which we write, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 3 for this routine) the output data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are incremented.
Parameters
[out]basePointer to the start of the memory area
offsetAligned array with offsets to the start of each triplet.
v01st component of triplets, added to base[align*offset[i]].
v12nd component of triplets, added to base[align*offset[i] + 1].
v23rd component of triplets, added to base[align*offset[i] + 2].

This function can work with both aligned (better performance) and unaligned memory. When the align parameter is not a power-of-two (align==3 would be normal for packed atomic coordinates) the memory obviously cannot be aligned, and we account for this. However, in the case where align is a power-of-two, we assume the base pointer also has the same alignment, which will enable many platforms to use faster aligned memory load/store operations. An easy way to think of this is that each triplet of data in memory must be aligned to the align parameter you specify when it's a power-of-two.

The offset memory must always be aligned to GMX_SIMD_FINT32_WIDTH, since this enables us to use SIMD loads and gather operations on platforms that support it.

Note
You should NOT scale offsets before calling this routine; it is done internally by using the alignment template parameter instead.
This routine uses a normal array for the offsets, since we typically load the data from memory. On the architectures we have tested this is faster even when a SIMD integer datatype is present.
To improve performance, this function might use full-SIMD-width unaligned load/store, and add 0.0 to the extra elements. This means you need to ensure the memory is padded at the end, so we always can load GMX_SIMD_REAL_WIDTH elements starting at the last offset. If you use the Gromacs aligned memory allocation routines this will always be the case.
template<int align>
static void gmx_simdcall gmx::transposeScatterIncrU ( float *  base,
const std::int32_t  offset[],
SimdFloat  v0,
SimdFloat  v1,
SimdFloat  v2 
)
inlinestatic

Transpose and add 3 SIMD floats to 3 consecutive addresses at GMX_SIMD_FLOAT_WIDTH offsets.

Template Parameters
alignAlignment of the memory to which we write, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 3 for this routine) the output data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are incremented.
Parameters
[out]basePointer to the start of the memory area
offsetAligned array with offsets to the start of each triplet.
v01st component of triplets, added to base[align*offset[i]].
v12nd component of triplets, added to base[align*offset[i] + 1].
v23rd component of triplets, added to base[align*offset[i] + 2].

This function can work with both aligned (better performance) and unaligned memory. When the align parameter is not a power-of-two (align==3 would be normal for packed atomic coordinates) the memory obviously cannot be aligned, and we account for this. However, in the case where align is a power-of-two, we assume the base pointer also has the same alignment, which will enable many platforms to use faster aligned memory load/store operations. An easy way to think of this is that each triplet of data in memory must be aligned to the align parameter you specify when it's a power-of-two.

The offset memory must always be aligned to GMX_SIMD_FINT32_WIDTH, since this enables us to use SIMD loads and gather operations on platforms that support it.

Note
You should NOT scale offsets before calling this routine; it is done internally by using the alignment template parameter instead.
This routine uses a normal array for the offsets, since we typically load the data from memory. On the architectures we have tested this is faster even when a SIMD integer datatype is present.
To improve performance, this function might use full-SIMD-width unaligned load/store, and add 0.0 to the extra elements. This means you need to ensure the memory is padded at the end, so we always can load GMX_SIMD_REAL_WIDTH elements starting at the last offset. If you use the Gromacs aligned memory allocation routines this will always be the case.
template<int align>
static void gmx_simdcall gmx::transposeScatterStoreU ( double *  base,
const std::int32_t  offset[],
SimdDouble  v0,
SimdDouble  v1,
SimdDouble  v2 
)
inlinestatic

Transpose and store 3 SIMD doubles to 3 consecutive addresses at GMX_SIMD_DOUBLE_WIDTH offsets.

Template Parameters
alignAlignment of the memory to which we write, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 3 for this routine) the output data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are written.
Parameters
[out]basePointer to the start of the memory area
offsetAligned array with offsets to the start of each triplet.
v01st component of triplets, written to base[align*offset[i]].
v12nd component of triplets, written to base[align*offset[i] + 1].
v23rd component of triplets, written to base[align*offset[i] + 2].

This function can work with both aligned (better performance) and unaligned memory. When the align parameter is not a power-of-two (align==3 would be normal for packed atomic coordinates) the memory obviously cannot be aligned, and we account for this. However, in the case where align is a power-of-two, we assume the base pointer also has the same alignment, which will enable many platforms to use faster aligned memory store operations. An easy way to think of this is that each triplet of data in memory must be aligned to the align parameter you specify when it's a power-of-two.

The offset memory must always be aligned to GMX_SIMD_FINT32_WIDTH, since this enables us to use SIMD loads and gather operations on platforms that support it.

Note
You should NOT scale offsets before calling this routine; it is done internally by using the alignment template parameter instead.
This routine uses a normal array for the offsets, since we typically load the data from memory. On the architectures we have tested this is faster even when a SIMD integer datatype is present.
template<int align>
static void gmx_simdcall gmx::transposeScatterStoreU ( float *  base,
const std::int32_t  offset[],
SimdFloat  v0,
SimdFloat  v1,
SimdFloat  v2 
)
inlinestatic

Transpose and store 3 SIMD floats to 3 consecutive addresses at GMX_SIMD_FLOAT_WIDTH offsets.

Template Parameters
alignAlignment of the memory to which we write, i.e. distance (measured in elements, not bytes) between index points. When this is identical to the number of SIMD variables (i.e., 3 for this routine) the output data is packed without padding in memory. See the SIMD parameters for exactly what memory positions are written.
Parameters
[out]basePointer to the start of the memory area
offsetAligned array with offsets to the start of each triplet.
v01st component of triplets, written to base[align*offset[i]].
v12nd component of triplets, written to base[align*offset[i] + 1].
v23rd component of triplets, written to base[align*offset[i] + 2].

This function can work with both aligned (better performance) and unaligned memory. When the align parameter is not a power-of-two (align==3 would be normal for packed atomic coordinates) the memory obviously cannot be aligned, and we account for this. However, in the case where align is a power-of-two, we assume the base pointer also has the same alignment, which will enable many platforms to use faster aligned memory store operations. An easy way to think of this is that each triplet of data in memory must be aligned to the align parameter you specify when it's a power-of-two.

The offset memory must always be aligned to GMX_SIMD_FINT32_WIDTH, since this enables us to use SIMD loads and gather operations on platforms that support it.

Note
You should NOT scale offsets before calling this routine; it is done internally by using the alignment template parameter instead.
This routine uses a normal array for the offsets, since we typically load the data from memory. On the architectures we have tested this is faster even when a SIMD integer datatype is present.
static Simd4Float gmx_simdcall gmx::trunc ( Simd4Float  a)
inlinestatic

Truncate SIMD4, i.e. round towards zero - common hardware instruction.

Parameters
aAny floating-point value
Returns
Integer rounded towards zero, represented in floating-point format.
Note
This is truncation towards zero, not floor(). The reason for this is that truncation is virtually always present as a dedicated hardware instruction, but floor() frequently isn't.
static Simd4Double gmx_simdcall gmx::trunc ( Simd4Double  a)
inlinestatic

Truncate SIMD4, i.e. round towards zero - common hardware instruction.

Parameters
aAny floating-point value
Returns
Integer rounded towards zero, represented in floating-point format.
Note
This is truncation towards zero, not floor(). The reason for this is that truncation is virtually always present as a dedicated hardware instruction, but floor() frequently isn't.
static SimdFloat gmx_simdcall gmx::trunc ( SimdFloat  a)
inlinestatic

Truncate SIMD float, i.e. round towards zero - common hardware instruction.

Parameters
aAny floating-point value
Returns
Integer rounded towards zero, represented in floating-point format.
Note
This is truncation towards zero, not floor(). The reason for this is that truncation is virtually always present as a dedicated hardware instruction, but floor() frequently isn't.
static SimdDouble gmx_simdcall gmx::trunc ( SimdDouble  a)
inlinestatic

Truncate SIMD double, i.e. round towards zero - common hardware instruction.

Parameters
aAny floating-point value
Returns
Integer rounded towards zero, represented in floating-point format.
Note
This is truncation towards zero, not floor(). The reason for this is that truncation is virtually always present as a dedicated hardware instruction, but floor() frequently isn't.

Variable Documentation

const int gmx::c_simdBestPairAlignmentDouble = 2
static

Best alignment to use for aligned pairs of double data.

The routines to load and transpose data will work with a wide range of alignments, but some might be faster than others, depending on the load instructions available in the hardware. This specifies the best alignment for each implementation when working with pairs of data.

To allow each architecture to use the most optimal form, we use a constant that code outside the SIMD module should use to store things properly. It must be at least 2. For example, a value of 2 means the two parameters A & B are stored as [A0 B0 A1 B1] while align-4 means [A0 B0 - - A1 B1 - -].

This alignment depends on the efficiency of partial-register load/store operations, and will depend on the architecture.

const int gmx::c_simdBestPairAlignmentFloat = 2
static

Best alignment to use for aligned pairs of float data.

The routines to load and transpose data will work with a wide range of alignments, but some might be faster than others, depending on the load instructions available in the hardware. This specifies the best alignment for each implementation when working with pairs of data.

To allow each architecture to use the most optimal form, we use a constant that code outside the SIMD module should use to store things properly. It must be at least 2. For example, a value of 2 means the two parameters A & B are stored as [A0 B0 A1 B1] while align-4 means [A0 B0 - - A1 B1 - -].

This alignment depends on the efficiency of partial-register load/store operations, and will depend on the architecture.