Re: HEADS UP: plan to switch many ports over to GCC 12 soon

To: Rhialto <rhialto%falu.nl@localhost>
Subject: Re: HEADS UP: plan to switch many ports over to GCC 12 soon
From: RVP <rvp%SDF.ORG@localhost>
Date: Tue, 2 Jul 2024 05:39:12 +0000 (UTC)

On Mon, 1 Jul 2024, Rhialto wrote:

Summary: it seems that libdeflate explicitly uses some gcc-intrinsic
functions on gcc 11+, and it is expected that the assembler can handle
the result.  So this doesn't depend on a compiler option and it looks
like we cannot disable it either.

Long version:

This is from .work.log for adler32.c, the offending source file.

[*] cc -DLIBDEFLATE_DLL -Dlibdeflate_shared_EXPORTS -I/tmp/pkgsrc/devel/libdeflate/default/libdeflate-1.20 -O2 -I/usr/pkg/include -I/usr/include -O2 -DNDEBUG -std=gnu99 -fPIC -fvisibility=hidden -Wall -Wdeclaration-after-statement -Wimplicit-fallthrough -Wmissing-field-initializers -Wmissing-prototypes -Wpedantic -Wshadow -Wstrict-prototypes -Wundef -Wvla -MD -MT CMakeFiles/libdeflate_shared.dir/lib/adler32.c.o -MF CMakeFiles/libdeflate_shared.dir/lib/adler32.c.o.d -o CMakeFiles/libdeflate_shared.dir/lib/adler32.c.o -c /tmp/pkgsrc/devel/libdeflate/default/libdeflate-1.20/lib/adler32.c
<.> /tmp/pkgsrc/devel/libdeflate/default/.gcc/bin/gcc -fcommon -fstack-protector-strong -D_FORTIFY_SOURCE=2 -Wl,-zrelro -fPIC -DLIBDEFLATE_DLL -Dlibdeflate_shared_EXPORTS -I/tmp/pkgsrc/devel/libdeflate/default/libdeflate-1.20 -O2 -I/tmp/pkgsrc/devel/libdeflate/default/.buildlink/include -O2 -DNDEBUG -std=gnu99 -fPIC -fvisibility=hidden -Wall -Wdeclaration-after-statement -Wimplicit-fallthrough -Wmissing-field-initializers -Wmissing-prototypes -Wpedantic -Wshadow -Wstrict-prototypes -Wundef -Wvla -MD -MT CMakeFiles/libdeflate_shared.dir/lib/adler32.c.o -MF CMakeFiles/libdeflate_shared.dir/lib/adler32.c.o.d -o CMakeFiles/libdeflate_shared.dir/lib/adler32.c.o -c /tmp/pkgsrc/devel/libdeflate/default/libdeflate-1.20/lib/adler32.c

pkg_comp:default.conf# ls -l /tmp/pkgsrc/devel/libdeflate/default/.gcc/bin/gcc
lrwxr-xr-x  1 root  wheel  22 Jul  1 08:30 /tmp/pkgsrc/devel/libdeflate/default/.gcc/bin/gcc -> /usr/pkg/gcc12/bin/gcc

The

/tmp//cceAo651.s:1377: Error: unsupported instruction `vpdpbusd'

seems to be the result of code, included from libdeflate-1.20/lib/adler32_impl.h:

/*
* AVX-VNNI implementation.  This is used on CPUs that have AVX2 and AVX-VNNI
* but don't have AVX-512, for example Intel Alder Lake.
*/
#if GCC_PREREQ(11, 1) || CLANG_PREREQ(12, 0, 13000000) || MSVC_PREREQ(1930)
#  define adler32_x86_avx2_vnni adler32_x86_avx2_vnni
#  define SUFFIX                           _avx2_vnni
#  define ATTRIBUTES            _target_attribute("avx2,avxvnni")
#  define VL                    32
#  define USE_VNNI              1
#  define USE_AVX512            0
#  include "adler32_template.h"
#endif

libdeflate-1.20/lib/x86/adler32_template.h has

#if USE_VNNI
       /*
        * This is Adler-32 using the vpdpbusd instruction from AVX512VNNI or
        * AVX-VNNI.  vpdpbusd multiplies the unsigned bytes of one vector by
        * the signed bytes of another vector and adds the sums in groups of 4
        * to the 32-bit elements of a third vector.  We use it in two ways:
        * multiplying the data bytes by a sequence like 64,63,62,...,1 for
        * calculating part of s2, and multiplying the data bytes by an all-ones
        * sequence 1,1,1,...,1 for calculating s1 and part of s2.  The all-ones
        * trick seems to be faster than the alternative of vpsadbw + vpaddd.
        */

...
                               v_s2   = VDPBUSD(v_s2,   data_a, mults);
and several more such calls
(VDPBUSD isn't exactly the same word as vpdpbusd but it's the closest one)

and this value for VDPBUSD, chosen because VL == 32 and USE_AVX512 == 0.
#    define VDPBUSD(a, b, c)    _mm256_dpbusd_avx_epi32((a), (b), (c))


So it seems that gcc12 is known to have this intrinsic function, some people
are using it, and the assembler is expected to cope somehow.
It doesn't seem like that gcc generates this instruction spontaneously.

/usr/pkg/gcc12/lib/gcc/x86_64--netbsd/12.3.0/include/avxvnniintrin.h has

extern __inline __m256i
__attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm256_dpbusd_avx_epi32(__m256i __A, __m256i __B, __m256i __C)
{
 return   (__m256i) __builtin_ia32_vpdpbusd_v8si ((__v8si) __A,
                                                  (__v8si) __B,
                                                  (__v8si) __C);
}


In some cases, gcc 12 generates a `{vex}` pseudo-prefix for AVX-NNI intrinsics:

```
$ gcc --version
gcc (nb3 20231008) 10.5.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ /usr/pkg/gcc12/bin/gcc --version
gcc (GCC) 12.3.0
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ pwd
/tmp/libdeflate-1.20

$ gcc -O2 -S -o /tmp/a1.s lib/adler32.c; as -o /dev/null /tmp/a1.s

$ /usr/pkg/gcc12/bin/gcc -O2 -S -o /tmp/a2.s lib/adler32.c; as -o /dev/null /tmp/a2.s

/tmp/a2.s: Assembler messages:
/tmp/a2.s:761: Error: unsupported instruction `vpdpbusd'
/tmp/a2.s:763: Error: unsupported instruction `vpdpbusd'
/tmp/a2.s:765: Error: unsupported instruction `vpdpbusd'
/tmp/a2.s:766: Error: unsupported instruction `vpdpbusd'
/tmp/a2.s:767: Error: unsupported instruction `vpdpbusd'
/tmp/a2.s:785: Error: unsupported instruction `vpdpbusd'
/tmp/a2.s:786: Error: unsupported instruction `vpdpbusd'
/tmp/a2.s:812: Error: unsupported instruction `vpdpbusd'
/tmp/a2.s:814: Error: unsupported instruction `vpdpbusd'
/tmp/a2.s:907: Error: unsupported instruction `vpdpbusd'
/tmp/a2.s:908: Error: unsupported instruction `vpdpbusd'
/tmp/a2.s:909: Error: unsupported instruction `vpdpbusd'
/tmp/a2.s:910: Error: unsupported instruction `vpdpbusd'
/tmp/a2.s:915: Error: unsupported instruction `vpdpbusd'
/tmp/a2.s:917: Error: unsupported instruction `vpdpbusd'
/tmp/a2.s:918: Error: unsupported instruction `vpdpbusd'
/tmp/a2.s:919: Error: unsupported instruction `vpdpbusd'

$ sed -n 761p /tmp/a2.s
        {vex} vpdpbusd  %ymm7, %ymm2, %ymm0

$
```

You'll need a newer assembler for gcc-12's output -- binutils-2.36, at least.
(The as(1) from binutils-2.39 in -HEAD works fine.)

-RVP

Follow-Ups:
- Re: HEADS UP: plan to switch many ports over to GCC 12 soon
  - From: Rhialto

References:
- Re: HEADS UP: plan to switch many ports over to GCC 12 soon
  - From: Rhialto
- re: HEADS UP: plan to switch many ports over to GCC 12 soon
  - From: matthew green
- Re: HEADS UP: plan to switch many ports over to GCC 12 soon
  - From: Rhialto

Prev by Date: Re: HEADS UP: plan to switch many ports over to GCC 12 soon
Next by Date: Re: HEADS UP: plan to switch many ports over to GCC 12 soon
Previous by Thread: Re: HEADS UP: plan to switch many ports over to GCC 12 soon
Next by Thread: Re: HEADS UP: plan to switch many ports over to GCC 12 soon
Indexes:

Home | Main Index | Thread Index | Old Index