New low-level SIMD support ============================ This is only of interest for people porting GROMACS to not-yet-widely-used hardware. Added AVX-512ER SIMD support ------------------------------------------------------------ This SIMD architecture adds 28-bit accuracy table lookups for 1/x and 1/sqrt(x), as well as a 23-bit accurate version of exp(x). This is likely to be the first architecture of the Knight's landing MIC, while standard AVX-512F is reserved for future Xeon x86 processors. Just as for the other low-level implementations, this does not include any kernels and will not be enabled automatically yet. Added AVX-512F SIMD support ------------------------------------------------------------ This adds cpuid detection for AVX-512F and the low-level SIMD implementation. It is not yet present in any shipping hardware, but the knights landing version of Intel MIC will be the first CPU to support it, as well as future generations of their normal x86 CPUs (i.e., it merges MIC and x86 SIMD). The implementation currently only works with icc, and passes unit tests in the Intel SDE emulator. Some kernel kernel support functions are not yet implemented, and the instruction set will not be enabled automatically yet. Added Power8 VSX SIMD support ------------------------------------------------------------ This adds the low-level SIMD implementation for IBM VMX, which is present on both Power7 and Power8. It passes unit tests with both gcc-4.9 and IBM xlc on Power7 (which is always big endian), and with gcc 4.8 and 4.9 on Power8 running linux in little-endian mode. It is not yet enabled automatically since we still lack nbnxn kernels for this architecture. Added Power/PowerPC VMX SIMD support ------------------------------------------------------------ This adds the low-level SIMD implementation for IBM VMX, which is present on Power 6 and later (and PowerPC after PPC970). This will not generate nbnxn kernels yet, so it is not enabled by default. Unit tests pass on Power8, where I also had to work around some vec_lvsl() issues due to the new little-endian PowerPC architecture. IBM has some quirks in their implementation of fnmadd, which occasionally returns negative zero when IEEE754 would expect positive zero. This should not cause any problems in normal Gromacs use, and to avoid problems in SIMD-math I have changed some operations there to use fmadd, with swapped signs of constants instead. This might even save a cycle on some platforms. Added 64-bit AArch64 asimd SIMD support ------------------------------------------------------------ This adds the low-level SIMD implementation for the 64-bit ARM AArch64 architecture in single and double precision. We use the asimd (advanced SIMD) nomenclature that is also present in the CPU flags, but this is the same as AArch64-neon, and present on all AArch64 hardware. Just as for the 32-bit ARM Neon support in the parent patch this will not generate kernels yet, and for this reason we do not yet enable AARCH64_ASIMD by default. Unit and regression tests pass on AArch64 hardware with gcc-4.9. Added 32-bit ARM Neon SIMD support ------------------------------------------------------------ This adds the low-level SIMD implementation for 32-bit ARM Neon instructions. We will still not generate nbnxn kernels for it; that is coming in a future update. For this reason we will also not enable ARM_NEON automatically in GMX_SIMD yet. The port passes our unit tests on tcbl04.theophys.kth.se (ARMv7).