Port-amd64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: dropping common asm functions



Le 19/04/2020 à 19:40, Andrew Doran a écrit :
> On Wed, Apr 15, 2020 at 08:30:38PM +0200, Maxime Villard wrote:
> 
>> I'd like to remove all the str*.S files from libc/arch/x86_64/string/, and
>> change the Makefile to use the C version of the functions instead.
>>
>> Asm doesn't receive instrumentation, and we have to resort to annoying wrappers
>> all the time (kmsan_strcat, kasan_strcat, etc).
> 
> What other functions might this affect?

You mean which functions do not receive instrumentation?

None of the asm functions receives instrumentation. This implies that the
instrumentation has to be applied manually, in a way that is more or less
instrusive, depending on the sanitizer.

KASAN/KCSAN are fine with loss of coverage. Ie if we don't have a wrapper
around an asm function, then it just means that the bugs in this function
won't be detected, but the system will continue to function.

KMSAN however needs comprehensive coverage for the system to function, so
strictly each asm function needs instrumentation. See the KMSAN_INIT_ARG
and KMSAN_INIT_RET macros in everything that is not wrapped.

Overall, certain sets of functions are clearly defined as being in asm on
all (most?) architectures. You can see them in subr_asan.c (copy*, ucas*,
ufetch*, atomic_*, bus_space_*). For these functions, we have MI wrappers
that do the job, and that's good.

When it comes to libc functions however, each architecture goes freestyle
and implements random sets of functions in asm, and it is hard to get a
clear understanding of what needs to be wrapped.

Eg I recently realized that aarch64 implements strnlen in asm (contrary
to amd64 which does it in C), and there is no wrapper for it. I could add
a wrapper for each sanitizer, but I would as well prefer to just switch
the function to C.

I see the need to clearly define which function should be in asm. I think
that the string functions should be in C.

Obviously we could say that it doesn't matter and that each wrapper should
be added as needed, but this will slow down the adoption of sanitizers on
other architectures, and has the risk of growing into an unmaintainable
spaghetti monster of wrappers.

>> Not relevant for performance, 
> 
> I don't see it that way.  The cumulative cost of primitives adds up and
> there would be an affect from this change, albeit small.

A discussion I've had these last days with Ryo Shimizu changed my mind a
little. The cumulative performance gain might indeed be relevant for
strlen and strcmp.

Overall my point of view mostly remains that it is futile to try to reduce
the micro-cost induced by C, when 99.9% of libc is in C anyway. We could
as well rewrite all of libc in asm in order to reduce the micro-cost, but
that is just futile, and at some point it also matters to have clean and
understandable MI code.

While maybe not all string functions should be switched to C, I think that
strcpy and strcat should.

> Timings in microseconds for 1000 calls to strncmp() on identical strings, on
> two very different CPUs comparing the C and assembly versions we have in
> libc now:
> 
> Length  Xeon    Xeon    G-T40E  G-T40E
>         C       asm     C       asm
> ------  ------- ------- ------- -------
> 4       8       7       31      32
> 32      40      35      187     159
> 1024    1004    1005    5147    4669

But... this doesn't look so bright, does it? The gain is not systematic,
and even then seems fairly negligible. I mean we could probably save that
noise by just aligning the function text to a cache line, yet we don't
feel like we need to do that, do we?

Maxime


Home | Main Index | Thread Index | Old Index