Re: Avoid UB in pslist.h (NULL + 0)

To: Taylor R Campbell <campbell+netbsd-tech-kern%mumble.net@localhost>
Subject: Re: Avoid UB in pslist.h (NULL + 0)
From: Kamil Rytarowski <n54%gmx.com@localhost>
Date: Tue, 24 Mar 2020 14:30:33 +0100
On 24.03.2020 07:43, Taylor R Campbell wrote:
>> Date: Sun, 22 Mar 2020 03:30:56 +0100
>> From: Kamil Rytarowski <n54%gmx.com@localhost>
>>
>> On 22.03.2020 01:50, Taylor R Campbell wrote:
>>> So far, after several weeks of discussion, nobody has presented a case
>>> that there is a credible thread of a compiler actually misbehaving in
>>> this scenario.
>>
>> There are no public declarations (that was requested on a local ML) but
>> according to my IRC talks, there were plans to start optimizing on NULL
>> + 0 operations, but is was/is considered as a chicken-egg issue. There
>> is a time span now to alert users and
> 
> Perhaps it is not a chicken-egg issue, but pointless abuse of leeway
> in the standard.  So far the public declarations are that Clang
> developers realized they _could_ abuse the leeway this way but they
> _chose not to_; that essentially everyone involved sees such
> `optimization' as abuse; that there are no public reasons stated for
> why the C standard diverges from the C++ standard on this note.
> 
> If you have contrary information, please cite it specifically.
> 

I have nothing to add. I'm personally agnostic to both sides.

>> Both languages can be synced here as the incompatibility was revealed
>> and discussed. Personally I wouldn't be surprised to see adoption of the
>> C behavior as nullptr + 0 is invalid in C++ (and it will be likely
>> invalid in future C).
> 
> The fragment nullptr + 0 is invalid for an unrelated reason: nullptr
> is a pointer to an incomplete type.  We've gone over this already.
> 
> It's not helpful to keep bringing it up: at best it's a distraction,
> and at worst it gives the impression you might not understand basic
> aspects of C and C++ -- an impression which, if true, would make it
> all the more important that you _not_ go around tweaking things to
> appease sanitizers before discussing and understanding them.
> 
>> So far in reply I get 'just noise from the tool' so maybe my messages
>> were not clear enough and reaching authorities (from the clang community
>> and C committee) not credible enough.
> 
> You successfully presented a case that it is technically undefined
> behaviour.  This is not in dispute -- a month ago I cited the specific
> place where the C standard neglects to define it[*] -- so I don't
> understand why you continue to argue that case. 
> 
> [*] https://mail-index.NetBSD.org/tech-kern/2020/02/25/msg026105.html
> 

I reported that these issues are triggered in userland rump (+ in the
pslist.h ATF tests). Rump has userland assumptions with compiler flags.

I already declarared that using assembly code is not planned to be touched.

>>> (b) Change how we invoke ubsan and the compiler by passing
>>>     -fno-delete-null-pointer-checks to clang.  joerg objected to this
>>>     but I don't recall the details off the top of my head; joerg, can
>>>     you expand on your argument against this, and which alternative
>>>     you would prefer?
>>
>> I tried to enable -fno-delete-null-pointer-check for RUMPKERNEL, but I
>> was asked to revert... after first getting acknowledge from you that it
>> is a good idea...
> 
> This is an inaccurate representation of what happened.  What I said is
> (https://mail-index.NetBSD.org/tech-kern/2020/03/08/msg026125.html):
> 
>   `Adding -fno-delete-null-pointer-checks for clang too sounds
>    sensible to me in general, but please check with joerg first.  It
>    remains unclear to me that it's necessary here.'
> 
> I asked you to revert because:
> (a) the discussion was still ongoing without a conclusion,
> (b) it remained unclear whether the change was necessary,

I noted that it is necessary at least for memcpy(NULL, NULL, X)-like
usage in rump. Even if we ignore NULL + 0.

The kernel code is already prebuilt with
-fno-delete-null-pointer-checks, but not userland rump. This causes
slight incompatibilities.

I don't see what's unclear. I k

> (c) you failed to check with joerg like I asked, and

I checked earlier that joerg disagrees with GCC and he has its
interpretation of the standard that memcpy(NULL, NULL, 0)-like
optimization is wrong.

> (d) joerg objected.
> 

And GCC developers do not support this (me neither).

> I appreciate that sometimes it takes longer than you or I might like
> to get a clear explanation out of joerg, but it is also fatiguing to
> have to correct misrepresentations of positions in long meandering
> threads in order to ensure changes you are itching to make -- or have
> already made despite ongoing discussion -- are justified and correct.
> 
>>> (c) Change how we invoke ubsan, but just ubsan -- not the compiler.
>>
>> UBSan is an integral part of a compiler.
> 
> What I meant is:  Change the options we pass for builds with the
> sanitizer enabled, but not the options we pass for normal builds.  In
> other words, either pass _both_ -fsanitize=undefined and
> -fno-delete-null-pointer-checks (if MKSANITIZER=yes), or _neither_ of
> them (if MKSANITIZER=no).
> 

This could be integrated into the toolchain.... but I consider this as
the wrong approach as every day Clang or GCC can decide to start
optimizing on NULL + 0 and we will note it only when it will crash the code.

>> If we introduce code that relies on UB assumptions it is rather opposite
>> of comprehensible code. Handling UB imposes handling corner-cases that
>> is rather a good style of programming.
> 
> The kernel relies on UB assumptions everywhere, because it's a _part_
> of the C implementation -- large swaths of the kernel are responsible
> for driving a physical machine to implement the C abstract machine.
> 
> As the maintainers of the kernel, we have to distinguish the UB that
> actually may have bad consequences, not blindly reject all UB because
> the letter of the standard doesn't define it in the C abstract
> machine.
> 

We also use this code in userland rump and this is the only focus here.

>>> (f) Ditch ubsan.  Does it have enough _true_ alarms, detecting actual
>>>     bugs, to make it worth keeping, or are we wasting a lot of time to
>>>     appease a tool that isn't actually giving us much value?
>>
>> 1. Signed integer overflows (ignoring the signedness bugs; I recall at
>> least 3 real bugs and one set of bugs generating garbage results)
>> 2. Alignment issues.
>> 3. Invalid VLA bound.
>> 4. Invalid shifts (base out of valid range)
>> 5. Pointer overflows.
> 
> Thanks.  Can you point to some of these bugs and fixes so we can
> document ubsan's record of utility?
> 

It's already tracked with kUBSan tags in the commit messages. Please
check with 'git log --grep'.

If we would use UBSan extensively, we would catch more real bugs even
for these signed integer overflows that are usually speaking
uninteresting. On the other hand these checks help to catch real bugs.
FreeBSD fixed ahd(4) issue that triggers UBSan as signedness bit is
changed un:

https://svnweb.freebsd.org/base?view=revision&sortby=date&revision=357300

There is an open PR for this, but I don't have hw to test or patch our
ahd(4) driver.

http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=54899

>>>     Other issues that I've seen raised by ubsan in practice, from
>>>     memory: memcpy(NULL,NULL,0), floating-point division by zero,
>>>     alignment issues.  The first two are non-issues in real
>>>     implementations; the last suggests that maybe the tests we're
>>>     running on, e.g., sparc64 aren't very extensive.
>>
>> 1. memcpy(NULL, NULL, 0): GCC is a real implementation.
>>
>> http://netbsd.org/~kamil/memcpy-ub.c
> 
> We already use -fno-delete-null-pointer-checks with GCC, so this is
> not relevant to us.  We've gone over this already.
> 

I disagree with this approach that UB is relevant only to compilers that
can miscompile the code.

>> 2. We don't use extensively floating-point and definitely not in the
>> kernel space. Currently there are no reports for it. I don't know what
>> is the expected result for division by zero and why would we call it a
>> well defined code.
> 
> This came up in another project[1], not in NetBSD.  IEEE 754
> floating-point division by zero is well-defined, and the vast majority
> of numerical code of the past quarter century relies on it.
> 
> Technically it may be undefined behaviour in C, yet IEEE 754
> arithmetic semantics is _more reliable_ than signed integer arithmetic
> semantics in practice, so ubsan's warnings about it were
> counterproductive in the real world; in the end it looks like clang
> may have just disabled this misfeature in ubsan because of that[2].
> 
> (Yes, here in NetBSD we kinda support VAX floating-point arithmetic,
> but I haven't seen any evidence that ubsan has helped us with
> maintaining VAX support.)
> 
> [1] https://trac.torproject.org/projects/tor/ticket/29527
>     https://trac.torproject.org/projects/tor/ticket/29528
> 
> [2] https://bugs.llvm.org/show_bug.cgi?id=19535
> 

I acknowledge it, but it is a problem now of mostly 3rd party code now.

I am familiar that the C standard committee actively works on floating
point specification and maybe we could standardize IEEE 754 in future.

>>>     Maybe someone can present an argument that ubsan is worth the pain
>>>     of patching and working around false alarms and merging local
>>>     changes into upstream updates.  It's not a priori clear to me --
>>>     I'm not proposing that we ditch ubsan; I'm just saying I'm not the
>>>     one to argue the case that we should use it as a bug-finding tool
>>>     and take on the costs of working around its many false alarms as a
>>>     rule.
>>
>> Possibly we talk about different things under the same of a 'false alarm'.
>>
>> If I deduce your definition correctly as 'usually works', we could stop
>> patching e.g. use-after-free our write out of bounds as that code
>> usually works.
> 
> You did not deduce this correctly.
> 
> Nobody seriously disputes that use-after-free or buffer overruns are
> bugs in general.  And I'll be the first to say that certain classes of
> bugs -- use-after-free and buffer overruns included -- can be treated
> as security issues whether or not an exploit is demonstrated.  E.g.:
> https://googleprojectzero.blogspot.com/2014/08/the-poisoned-nul-byte-2014-edition.html
> 

I've landed two patches for buffer overruns (one of them was declared to
be fixed in a better way by kre).

The use-after-free one in rump is harder to fix and I will look into later.

That code used to work for years on all tested ports.

> But so far I haven't seen any serious reason to treat adding zero to a
> null pointer as a bug.
> 

I feel like being pushed to implement this optimization in LLVM myself.

> 
> I asked you to present an argument about a set of options (a)-(f)
> about how to proceed. 

OK.

> I am not totally sure what your argument was,
> but it sounds like you objected to (f) [ditch ubsan] and favoured (b)
> [pass -fno-delete-null-pointer-check always] or (d) [patch ubsan]; it
> is unclear to me whether you favour, object to, or have no position on
> (a) [status quo -- live with the warnings], (c) [pass
> -fno-delete-null-pointer-check only when sanitizers are enabled] or
> (e) [tweak our code to suppress false alarms].  If that's not
> accurate, please clarify.
> 

I fundamentally disagree that these points mentioned by you enforce
accepting terminology of 'false alarms' so 'accepting' these points is
not realizable.

As my opinion is requested, I would iterate in this order:

(1) Pass -fno-delete-null-pointer-check to rump unconditionally (for
GCC+Clang). This will pacify all concerns whether we agree or not with
clang/gcc on particular cases and ensure that similar semantics is
reproduced in the kernel and in the rumpkernel. Do the same for pslist.h
ATF tests.

(2) Avoid NULL+0 (reported only in pslist.h) and memcpy(NULL, NULL, 0)
usage in rumpkernel (reported in ~3 places so far).

(3) Patch Clang to start optimizing on NULL + in C so we can return to
points (1) and (2).

(4) Patch Clang to disable NULL+0 checks on NetBSD. A poor man solution
is patching the runtime to suppress it. I have no intention to try to
convince upstream of Clang that NULL+0 checks are wrong. It looks like a
time bomb of miscompiling the code.

(5) Pass -fno-delete-null-pointer-check for UBSan invocation... but this
just papers over the issues and is a time bomb of miscompiling non-UBSan
code.

(6) Ignore / disable UBSan (both are equivalent).

This would be a fiasco and sad especially that there are virtually no
false alarms. In the whole kernel there is just a single place marked
with __noubsan that is a real false positive, in x86 hotpatch code that
relies on linker assistance.

https://nxr.netbsd.org/xref/src/sys/arch/x86/x86/patch.c#136

I note that there are commercial vendors interested in this work to
happen (that matters for their research whether to start or continue
using the NetBSD code) as currently the NetBSD/rumpkernel code is harder
to be trusted without these objective metrics and checks (such as
checked with *San).

rumpkernel is not widely used as:
(A) Quality is unclear.
(B) There is shortage of coding examples.
(C) The framework is not sufficiently documented.

Not many vendors are ready to risk and trust the NetBSD/rump code. This
QA effort and planned GSoC tasks for this year intend to improve all
these A-C points.

Also if we want to fuzz the rumpkernel code (GSoC task proposal and
research by some of our devs in a few years), we MUST use sanitizers.
Otherwise the efficiency of fuzzing is one or two orders of magnitude
less effective.
Attachment: signature.asc
Description: OpenPGP digital signature
Follow-Ups:
- Re: Avoid UB in pslist.h (NULL + 0)
  - From: Kamil Rytarowski
References:
- Re: Avoid UB in pslist.h (NULL + 0)
  - From: Taylor R Campbell
Prev by Date: Re: Avoid UB in pslist.h (NULL + 0)
Next by Date: Re: Avoid UB in pslist.h (NULL + 0)
Previous by Thread: Re: Avoid UB in pslist.h (NULL + 0)
Next by Thread: Re: Avoid UB in pslist.h (NULL + 0)
Indexes:
Home | Main Index | Thread Index | Old Index