Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: HEADS UP: plan to switch many ports over to GCC 12 soon



On Mon 01 Jul 2024 at 05:22:26 +1000, matthew green wrote:
> Rhialto writes:
> > [ 55%] Building C object CMakeFiles/libdeflate_static.dir/lib/adler32.c.o
> > /tmp//ccZfStzB.s:1377: Error: unsupported instruction `vpdpbusd'
> 
> this is odd.  it feels like the compiler is being invoked wrongly, as
> it will need some -m<opt> to enable these instructions, including by
> passing down relevant options to gas.
> 
> can you get the full compile line for these files?

Summary: it seems that libdeflate explicitly uses some gcc-intrinsic
functions on gcc 11+, and it is expected that the assembler can handle
the result.  So this doesn't depend on a compiler option and it looks
like we cannot disable it either.

Long version:

This is from .work.log for adler32.c, the offending source file.

[*] cc -DLIBDEFLATE_DLL -Dlibdeflate_shared_EXPORTS -I/tmp/pkgsrc/devel/libdeflate/default/libdeflate-1.20 -O2 -I/usr/pkg/include -I/usr/include -O2 -DNDEBUG -std=gnu99 -fPIC -fvisibility=hidden -Wall -Wdeclaration-after-statement -Wimplicit-fallthrough -Wmissing-field-initializers -Wmissing-prototypes -Wpedantic -Wshadow -Wstrict-prototypes -Wundef -Wvla -MD -MT CMakeFiles/libdeflate_shared.dir/lib/adler32.c.o -MF CMakeFiles/libdeflate_shared.dir/lib/adler32.c.o.d -o CMakeFiles/libdeflate_shared.dir/lib/adler32.c.o -c /tmp/pkgsrc/devel/libdeflate/default/libdeflate-1.20/lib/adler32.c
<.> /tmp/pkgsrc/devel/libdeflate/default/.gcc/bin/gcc -fcommon -fstack-protector-strong -D_FORTIFY_SOURCE=2 -Wl,-zrelro -fPIC -DLIBDEFLATE_DLL -Dlibdeflate_shared_EXPORTS -I/tmp/pkgsrc/devel/libdeflate/default/libdeflate-1.20 -O2 -I/tmp/pkgsrc/devel/libdeflate/default/.buildlink/include -O2 -DNDEBUG -std=gnu99 -fPIC -fvisibility=hidden -Wall -Wdeclaration-after-statement -Wimplicit-fallthrough -Wmissing-field-initializers -Wmissing-prototypes -Wpedantic -Wshadow -Wstrict-prototypes -Wundef -Wvla -MD -MT CMakeFiles/libdeflate_shared.dir/lib/adler32.c.o -MF CMakeFiles/libdeflate_shared.dir/lib/adler32.c.o.d -o CMakeFiles/libdeflate_shared.dir/lib/adler32.c.o -c /tmp/pkgsrc/devel/libdeflate/default/libdeflate-1.20/lib/adler32.c

pkg_comp:default.conf# ls -l /tmp/pkgsrc/devel/libdeflate/default/.gcc/bin/gcc
lrwxr-xr-x  1 root  wheel  22 Jul  1 08:30 /tmp/pkgsrc/devel/libdeflate/default/.gcc/bin/gcc -> /usr/pkg/gcc12/bin/gcc

The

/tmp//cceAo651.s:1377: Error: unsupported instruction `vpdpbusd'

seems to be the result of code, included from libdeflate-1.20/lib/adler32_impl.h:

/*
 * AVX-VNNI implementation.  This is used on CPUs that have AVX2 and AVX-VNNI
 * but don't have AVX-512, for example Intel Alder Lake.
 */
#if GCC_PREREQ(11, 1) || CLANG_PREREQ(12, 0, 13000000) || MSVC_PREREQ(1930)
#  define adler32_x86_avx2_vnni adler32_x86_avx2_vnni
#  define SUFFIX                           _avx2_vnni
#  define ATTRIBUTES            _target_attribute("avx2,avxvnni")
#  define VL                    32
#  define USE_VNNI              1
#  define USE_AVX512            0
#  include "adler32_template.h"
#endif

libdeflate-1.20/lib/x86/adler32_template.h has

#if USE_VNNI
        /*
         * This is Adler-32 using the vpdpbusd instruction from AVX512VNNI or
         * AVX-VNNI.  vpdpbusd multiplies the unsigned bytes of one vector by
         * the signed bytes of another vector and adds the sums in groups of 4
         * to the 32-bit elements of a third vector.  We use it in two ways:
         * multiplying the data bytes by a sequence like 64,63,62,...,1 for
         * calculating part of s2, and multiplying the data bytes by an all-ones
         * sequence 1,1,1,...,1 for calculating s1 and part of s2.  The all-ones
         * trick seems to be faster than the alternative of vpsadbw + vpaddd.
         */

...
                                v_s2   = VDPBUSD(v_s2,   data_a, mults);
and several more such calls
(VDPBUSD isn't exactly the same word as vpdpbusd but it's the closest one)

and this value for VDPBUSD, chosen because VL == 32 and USE_AVX512 == 0.
#    define VDPBUSD(a, b, c)    _mm256_dpbusd_avx_epi32((a), (b), (c))


So it seems that gcc12 is known to have this intrinsic function, some people
are using it, and the assembler is expected to cope somehow.
It doesn't seem like that gcc generates this instruction spontaneously.

/usr/pkg/gcc12/lib/gcc/x86_64--netbsd/12.3.0/include/avxvnniintrin.h has 

extern __inline __m256i
__attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm256_dpbusd_avx_epi32(__m256i __A, __m256i __B, __m256i __C)
{
  return   (__m256i) __builtin_ia32_vpdpbusd_v8si ((__v8si) __A,
                                                   (__v8si) __B,
                                                   (__v8si) __C);
}


Somewhat similarly, I had to patch Firefox a bit too:

+++ third_party/gemmology/gemmology.h

-#ifdef __AVXVNNI__
+#ifdef __AVXVNNI__NOT

which may take out too much (I don't know exactly which instructions
aren't supported) but it worked in a pinch.

> .mrg.
-Olaf.
-- 
___ Olaf 'Rhialto' Seibert                            <rhialto/at/falu.nl>
\X/ There is no AI. There is just someone else's work.           --I. Rose

Attachment: signature.asc
Description: PGP signature



Home | Main Index | Thread Index | Old Index