Extended Description
On POWER8 and below, this code generates the same code as vec_abs, but on POWER9 the generate code is quite terrible. Compile with -mcpu=power9 -O3. Example (or on Compiler Explorer: https://godbolt.org/z/avnTxh9M6):
#include <stdint.h>
typedef int8_t i8x16 __attribute__((__vector_size__(16)));
i8x16
i8x16_abs(i8x16 a) {
i8x16 r;
for (int i = 0 ; i < 16 ; i++) {
r[i] = (a[i] < 0) ? -a[i] : a[i];
}
return r;
}
LLVM-MCA says RThroughput is 8, vs 1.5 for the POWER8 version.