Skip to content

slow code for absolute value of int8 x 16 vector on POWER9 at -O3 #50249

@llvmbot

Description

@llvmbot
Bugzilla Link 50905
Version trunk
OS Windows NT
Reporter LLVM Bugzilla Contributor
CC @efriedma-quic,@bzEq,@nemanjai

Extended Description

On POWER8 and below, this code generates the same code as vec_abs, but on POWER9 the generate code is quite terrible. Compile with -mcpu=power9 -O3. Example (or on Compiler Explorer: https://godbolt.org/z/avnTxh9M6):

#include <stdint.h>

typedef int8_t i8x16 __attribute__((__vector_size__(16)));

i8x16
i8x16_abs(i8x16 a) {
    i8x16 r;

    for (int i = 0 ; i < 16 ; i++) {
        r[i] = (a[i] < 0) ? -a[i] : a[i];
    }

    return r;
}

LLVM-MCA says RThroughput is 8, vs 1.5 for the POWER8 version.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions