Summary: it seems that libdeflate explicitly uses some gcc-intrinsic
functions on gcc 11+, and it is expected that the assembler can handle
the result. So this doesn't depend on a compiler option and it looks
like we cannot disable it either.
Long version:
This is from .work.log for adler32.c, the offending source file.
[*] cc -DLIBDEFLATE_DLL -Dlibdeflate_shared_EXPORTS -I/tmp/pkgsrc/devel/libdeflate/default/libdeflate-1.20 -O2 -I/usr/pkg/include -I/usr/include -O2 -DNDEBUG -std=gnu99 -fPIC -fvisibility=hidden -Wall -Wdeclaration-after-statement -Wimplicit-fallthrough -Wmissing-field-initializers -Wmissing-prototypes -Wpedantic -Wshadow -Wstrict-prototypes -Wundef -Wvla -MD -MT CMakeFiles/libdeflate_shared.dir/lib/adler32.c.o -MF CMakeFiles/libdeflate_shared.dir/lib/adler32.c.o.d -o CMakeFiles/libdeflate_shared.dir/lib/adler32.c.o -c /tmp/pkgsrc/devel/libdeflate/default/libdeflate-1.20/lib/adler32.c
<.> /tmp/pkgsrc/devel/libdeflate/default/.gcc/bin/gcc -fcommon -fstack-protector-strong -D_FORTIFY_SOURCE=2 -Wl,-zrelro -fPIC -DLIBDEFLATE_DLL -Dlibdeflate_shared_EXPORTS -I/tmp/pkgsrc/devel/libdeflate/default/libdeflate-1.20 -O2 -I/tmp/pkgsrc/devel/libdeflate/default/.buildlink/include -O2 -DNDEBUG -std=gnu99 -fPIC -fvisibility=hidden -Wall -Wdeclaration-after-statement -Wimplicit-fallthrough -Wmissing-field-initializers -Wmissing-prototypes -Wpedantic -Wshadow -Wstrict-prototypes -Wundef -Wvla -MD -MT CMakeFiles/libdeflate_shared.dir/lib/adler32.c.o -MF CMakeFiles/libdeflate_shared.dir/lib/adler32.c.o.d -o CMakeFiles/libdeflate_shared.dir/lib/adler32.c.o -c /tmp/pkgsrc/devel/libdeflate/default/libdeflate-1.20/lib/adler32.c
pkg_comp:default.conf# ls -l /tmp/pkgsrc/devel/libdeflate/default/.gcc/bin/gcc
lrwxr-xr-x 1 root wheel 22 Jul 1 08:30 /tmp/pkgsrc/devel/libdeflate/default/.gcc/bin/gcc -> /usr/pkg/gcc12/bin/gcc
The
/tmp//cceAo651.s:1377: Error: unsupported instruction `vpdpbusd'
seems to be the result of code, included from libdeflate-1.20/lib/adler32_impl.h:
/*
* AVX-VNNI implementation. This is used on CPUs that have AVX2 and AVX-VNNI
* but don't have AVX-512, for example Intel Alder Lake.
*/
#if GCC_PREREQ(11, 1) || CLANG_PREREQ(12, 0, 13000000) || MSVC_PREREQ(1930)
# define adler32_x86_avx2_vnni adler32_x86_avx2_vnni
# define SUFFIX _avx2_vnni
# define ATTRIBUTES _target_attribute("avx2,avxvnni")
# define VL 32
# define USE_VNNI 1
# define USE_AVX512 0
# include "adler32_template.h"
#endif
libdeflate-1.20/lib/x86/adler32_template.h has
#if USE_VNNI
/*
* This is Adler-32 using the vpdpbusd instruction from AVX512VNNI or
* AVX-VNNI. vpdpbusd multiplies the unsigned bytes of one vector by
* the signed bytes of another vector and adds the sums in groups of 4
* to the 32-bit elements of a third vector. We use it in two ways:
* multiplying the data bytes by a sequence like 64,63,62,...,1 for
* calculating part of s2, and multiplying the data bytes by an all-ones
* sequence 1,1,1,...,1 for calculating s1 and part of s2. The all-ones
* trick seems to be faster than the alternative of vpsadbw + vpaddd.
*/
...
v_s2 = VDPBUSD(v_s2, data_a, mults);
and several more such calls
(VDPBUSD isn't exactly the same word as vpdpbusd but it's the closest one)
and this value for VDPBUSD, chosen because VL == 32 and USE_AVX512 == 0.
# define VDPBUSD(a, b, c) _mm256_dpbusd_avx_epi32((a), (b), (c))
So it seems that gcc12 is known to have this intrinsic function, some people
are using it, and the assembler is expected to cope somehow.
It doesn't seem like that gcc generates this instruction spontaneously.
/usr/pkg/gcc12/lib/gcc/x86_64--netbsd/12.3.0/include/avxvnniintrin.h has
extern __inline __m256i
__attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm256_dpbusd_avx_epi32(__m256i __A, __m256i __B, __m256i __C)
{
return (__m256i) __builtin_ia32_vpdpbusd_v8si ((__v8si) __A,
(__v8si) __B,
(__v8si) __C);
}