On Mon 01 Jul 2024 at 05:22:26 +1000, matthew green wrote: > Rhialto writes: > > [ 55%] Building C object CMakeFiles/libdeflate_static.dir/lib/adler32.c.o > > /tmp//ccZfStzB.s:1377: Error: unsupported instruction `vpdpbusd' > > this is odd. it feels like the compiler is being invoked wrongly, as > it will need some -m<opt> to enable these instructions, including by > passing down relevant options to gas. > > can you get the full compile line for these files? Summary: it seems that libdeflate explicitly uses some gcc-intrinsic functions on gcc 11+, and it is expected that the assembler can handle the result. So this doesn't depend on a compiler option and it looks like we cannot disable it either. Long version: This is from .work.log for adler32.c, the offending source file. [*] cc -DLIBDEFLATE_DLL -Dlibdeflate_shared_EXPORTS -I/tmp/pkgsrc/devel/libdeflate/default/libdeflate-1.20 -O2 -I/usr/pkg/include -I/usr/include -O2 -DNDEBUG -std=gnu99 -fPIC -fvisibility=hidden -Wall -Wdeclaration-after-statement -Wimplicit-fallthrough -Wmissing-field-initializers -Wmissing-prototypes -Wpedantic -Wshadow -Wstrict-prototypes -Wundef -Wvla -MD -MT CMakeFiles/libdeflate_shared.dir/lib/adler32.c.o -MF CMakeFiles/libdeflate_shared.dir/lib/adler32.c.o.d -o CMakeFiles/libdeflate_shared.dir/lib/adler32.c.o -c /tmp/pkgsrc/devel/libdeflate/default/libdeflate-1.20/lib/adler32.c <.> /tmp/pkgsrc/devel/libdeflate/default/.gcc/bin/gcc -fcommon -fstack-protector-strong -D_FORTIFY_SOURCE=2 -Wl,-zrelro -fPIC -DLIBDEFLATE_DLL -Dlibdeflate_shared_EXPORTS -I/tmp/pkgsrc/devel/libdeflate/default/libdeflate-1.20 -O2 -I/tmp/pkgsrc/devel/libdeflate/default/.buildlink/include -O2 -DNDEBUG -std=gnu99 -fPIC -fvisibility=hidden -Wall -Wdeclaration-after-statement -Wimplicit-fallthrough -Wmissing-field-initializers -Wmissing-prototypes -Wpedantic -Wshadow -Wstrict-prototypes -Wundef -Wvla -MD -MT CMakeFiles/libdeflate_shared.dir/lib/adler32.c.o -MF CMakeFiles/libdeflate_shared.dir/lib/adler32.c.o.d -o CMakeFiles/libdeflate_shared.dir/lib/adler32.c.o -c /tmp/pkgsrc/devel/libdeflate/default/libdeflate-1.20/lib/adler32.c pkg_comp:default.conf# ls -l /tmp/pkgsrc/devel/libdeflate/default/.gcc/bin/gcc lrwxr-xr-x 1 root wheel 22 Jul 1 08:30 /tmp/pkgsrc/devel/libdeflate/default/.gcc/bin/gcc -> /usr/pkg/gcc12/bin/gcc The /tmp//cceAo651.s:1377: Error: unsupported instruction `vpdpbusd' seems to be the result of code, included from libdeflate-1.20/lib/adler32_impl.h: /* * AVX-VNNI implementation. This is used on CPUs that have AVX2 and AVX-VNNI * but don't have AVX-512, for example Intel Alder Lake. */ #if GCC_PREREQ(11, 1) || CLANG_PREREQ(12, 0, 13000000) || MSVC_PREREQ(1930) # define adler32_x86_avx2_vnni adler32_x86_avx2_vnni # define SUFFIX _avx2_vnni # define ATTRIBUTES _target_attribute("avx2,avxvnni") # define VL 32 # define USE_VNNI 1 # define USE_AVX512 0 # include "adler32_template.h" #endif libdeflate-1.20/lib/x86/adler32_template.h has #if USE_VNNI /* * This is Adler-32 using the vpdpbusd instruction from AVX512VNNI or * AVX-VNNI. vpdpbusd multiplies the unsigned bytes of one vector by * the signed bytes of another vector and adds the sums in groups of 4 * to the 32-bit elements of a third vector. We use it in two ways: * multiplying the data bytes by a sequence like 64,63,62,...,1 for * calculating part of s2, and multiplying the data bytes by an all-ones * sequence 1,1,1,...,1 for calculating s1 and part of s2. The all-ones * trick seems to be faster than the alternative of vpsadbw + vpaddd. */ ... v_s2 = VDPBUSD(v_s2, data_a, mults); and several more such calls (VDPBUSD isn't exactly the same word as vpdpbusd but it's the closest one) and this value for VDPBUSD, chosen because VL == 32 and USE_AVX512 == 0. # define VDPBUSD(a, b, c) _mm256_dpbusd_avx_epi32((a), (b), (c)) So it seems that gcc12 is known to have this intrinsic function, some people are using it, and the assembler is expected to cope somehow. It doesn't seem like that gcc generates this instruction spontaneously. /usr/pkg/gcc12/lib/gcc/x86_64--netbsd/12.3.0/include/avxvnniintrin.h has extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm256_dpbusd_avx_epi32(__m256i __A, __m256i __B, __m256i __C) { return (__m256i) __builtin_ia32_vpdpbusd_v8si ((__v8si) __A, (__v8si) __B, (__v8si) __C); } Somewhat similarly, I had to patch Firefox a bit too: +++ third_party/gemmology/gemmology.h -#ifdef __AVXVNNI__ +#ifdef __AVXVNNI__NOT which may take out too much (I don't know exactly which instructions aren't supported) but it worked in a pinch. > .mrg. -Olaf. -- ___ Olaf 'Rhialto' Seibert <rhialto/at/falu.nl> \X/ There is no AI. There is just someone else's work. --I. Rose
Attachment:
signature.asc
Description: PGP signature