Re: Avoid UB in pslist.h (NULL + 0)

To: Taylor R Campbell <campbell+netbsd-tech-kern%mumble.net@localhost>
Subject: Re: Avoid UB in pslist.h (NULL + 0)
From: Kamil Rytarowski <n54%gmx.com@localhost>
Date: Sun, 22 Mar 2020 03:30:56 +0100
On 22.03.2020 01:50, Taylor R Campbell wrote:
>> Date: Sun, 22 Mar 2020 00:03:57 +0100
>> From: Kamil Rytarowski <n54%gmx.com@localhost>
>>
>> I propose to change the fun(pointer + 0) logic with fun(pointer, 0).
> 
> I don't think this is a good approach -- it requires modifying code
> further and further away from the relevant part.
> 
> 
> But let's step back a moment.
> 
> So far, after several weeks of discussion, nobody has presented a case
> that there is a credible thread of a compiler actually misbehaving in
> this scenario.
> 

There are no public declarations (that was requested on a local ML) but
according to my IRC talks, there were plans to start optimizing on NULL
+ 0 operations, but is was/is considered as a chicken-egg issue. There
is a time span now to alert users and

It was also confirmed by two people that clang or a C compiler in
general is allowed to optimize the code.

> Yes, it is technically undefined behaviour -- nobody disputes that --
> but so is lots of other stuff that NetBSD relies on such as inline
> asm.  In C++, it is explicitly _not_ undefined behaviour; Clang
> specifically stopped short of exploiting it as undefined behaviour in
> C; nobody presented reasons why it should be undefined in C but
> defined in C++.
> 

Our kernel/rump code is mostly in C or at most a common subset of C and
C++. As the languages are not fully compatible in details, it's safe to
assume the C behavior that is valid for every C++ program, rather than
C++ one that is not valid for C.

Both languages can be synced here as the incompatibility was revealed
and discussed. Personally I wouldn't be surprised to see adoption of the
C behavior as nullptr + 0 is invalid in C++ (and it will be likely
invalid in future C).

> So this all serves to work around false alarms from ubsan -- not
> actual bugs, just noise from the tool.  Presumably the reason we use
> ubsan at all is that it helps find actual bugs -- true alarms.
> There's a few ways we might approach the false alarms:
> 

Out of 3 cases that I was requested, 3 of them were confirmed to be real
bugs (alignment, memcpy(NULL,NULL, 0), ptr + 0). Two of the cases can
generate crashing code now. One of them was researched by the clang
developers and C committee and confirmed that a compiler can impose
assumptions that would break misdesigned code.

I presented examples for the two UB issues that they are harmful now.

So far in reply I get 'just noise from the tool' so maybe my messages
were not clear enough and reaching authorities (from the clang community
and C committee) not credible enough.

> (a) Ignore them.  It's what we've been doing so far.
> 

This is not true. We already run our kernel with GCC with 0 UBSan
reports and perform syzbot fuzzing. We catch real problems there.

It already happened.

We are reaching to the point of running all ATF reports under UBSan.

If we cannot suppress noise from a tool (any kind of it), then the tool
is not much usable for signaling real problems.

> (b) Change how we invoke ubsan and the compiler by passing
>     -fno-delete-null-pointer-checks to clang.  joerg objected to this
>     but I don't recall the details off the top of my head; joerg, can
>     you expand on your argument against this, and which alternative
>     you would prefer?
> 

I tried to enable -fno-delete-null-pointer-check for RUMPKERNEL, but I
was asked to revert... after first getting acknowledge from you that it
is a good idea...

> (c) Change how we invoke ubsan, but just ubsan -- not the compiler.
>     There's a cost to this: if we diverge from how we invoke the
>     compiler, we might be disabling true alarms from ubsan that
>     reflect how the compiler will actually behave.
> 

UBSan is an integral part of a compiler.

> (d) Patch ubsan to disable the false alarms.  This incurs a
>     maintenance burden, but maybe leaves less of a sharp edge to trip
>     on than setting compiler flags in the makefile -- updates that
>     change the behaviour are more likely to lead to merge conflicts.
> 

I'm fine to patch the tool to disable false alarms or rather report them
upstream.

There are some indeed false positives sometimes when we depend on
assumptions that are imposed after compilation, during linking phase. In
these cases we disable sanitizing as there is no way to teach the tool.

> (e) Patch our own code to suppress the false alarms.  The cost to this
>     churn is that it can introduce bugs of its own, and make the code
>     harder to understand, and the complexity may become obsolete in
>     the next version of the tool but will remain a Chesterton's fence
>     (`why is there a dummy argument in all these functions? can we get
>     rid of it?').
> 

Every single commit can introduce bugs on their own.

If we introduce code that relies on UB assumptions it is rather opposite
of comprehensible code. Handling UB imposes handling corner-cases that
is rather a good style of programming.

"complexity may become obsolete in the next version of the tool" If I
read this sentence correctly, it's using some language features that can
be changed. _Noreturn can be removed from C2x (along with.. intmax_t)
and replaced by a variation borrowed from C++. These things happen
regardless of tool in use and programming language we pick.

> (f) Ditch ubsan.  Does it have enough _true_ alarms, detecting actual
>     bugs, to make it worth keeping, or are we wasting a lot of time to
>     appease a tool that isn't actually giving us much value?
> 

1. Signed integer overflows (ignoring the signedness bugs; I recall at
least 3 real bugs and one set of bugs generating garbage results)
2. Alignment issues.
3. Invalid VLA bound.
4. Invalid shifts (base out of valid range)
5. Pointer overflows.


>     From memory, most of the changes I've seen to appease ubsan are to
>     replace (1 << n) by (1u << n) when n=31.  Some projects take
>     advantage of -ftrapv and use signed integer types to express that
>     overflow is a mistake.  Some projects want two's complement
>     everywhere and use -fwrapv.
> 

-fwrapv generates more code and enforces certain guarantees. This, among
others, allows NULL + X operations.

There are projects that go the opposite direction and specify -ftrapv
(at least for debug/development builds). It's not hard to find a project
that uses -ftrapv for hardening (gdnsd). This gcc/clang command line
argument causes aborts on signed overflows/underflows.

Generally speaking, I agree that these reports are the most
uninteresting ones and the most common ones. We don't support CPUs
sensitive to these issues.

On the other hand almost all such reports are now squashed.

>     We're not committed one way or another; maybe we should keep it
>     that way, and ubsan helps us to do so; maybe it would be easier to
>     commit to -fwrapv.  There is a cost to patching all of these
>     shifts -- it makes merging from upstreams more painful.
> 
>     Other issues that I've seen raised by ubsan in practice, from
>     memory: memcpy(NULL,NULL,0), floating-point division by zero,
>     alignment issues.  The first two are non-issues in real
>     implementations; the last suggests that maybe the tests we're
>     running on, e.g., sparc64 aren't very extensive.
> 

1. memcpy(NULL, NULL, 0): GCC is a real implementation.

http://netbsd.org/~kamil/memcpy-ub.c

2. We don't use extensively floating-point and definitely not in the
kernel space. Currently there are no reports for it. I don't know what
is the expected result for division by zero and why would we call it a
well defined code.

3. Alignment issues, SPARC is sensitive.. but it's common to RISC CPUs.
UBSan caught issues that were breaking ARM code in the kernel. For
PowerPC there can be cases (afair) that misaligned accesses trap only on
page boundaries and so trap randomly. For aarch64 CPUs, there are real
boards that have unimplemented/broken support for misaligned accesses.

From a different perspective, misaligned accesses break (or used to
break as there were some fixes) LLVM XRay on x86.

>     Maybe someone can present an argument that ubsan is worth the pain
>     of patching and working around false alarms and merging local
>     changes into upstream updates.  It's not a priori clear to me --
>     I'm not proposing that we ditch ubsan; I'm just saying I'm not the
>     one to argue the case that we should use it as a bug-finding tool
>     and take on the costs of working around its many false alarms as a
>     rule.
> 

Possibly we talk about different things under the same of a 'false alarm'.

If I deduce your definition correctly as 'usually works', we could stop
patching e.g. use-after-free our write out of bounds as that code
usually works.

I've caught both of these usually works problems that are 'false
positives' in rump.

> First, I'd like to see a clearer argument about these options before
> we move forward with more churn to suppress false alarms.
> 
> 
> _After_ that argument, if we choose (e), patching our code, I would
> like to see a way to get the static type checking in the pslist macros
> (which has detected real bugs, at least in the container_of version
> and probably also for the pslist macros) that doesn't require nonlocal
> changes.  Originally (for container_of) I did something like
> 
> (sizeof(&a - &b), ...)
> 
> to verify that a and b have compatible types, but some compilers (gcc,
> I think) objected to a left-hand expression with no side effects in a
> comma operator.  So I switched to
> 
> (expression yielding a pointer) + 0*sizeof(&a - &b)
> 
> to suppress that warning.  However, coverity didn't like something
> about the sizeof and patching coverity is not an option for us, so we
> defined __validate_container_of(...) to expand to 0 for coverity and
> to 0*sizeof(&a - &b) for non-coverity -- which is fine because we
> still get the type-checking for normal builds.
> 
> This is still a bit of a kludge -- it only works when some
> subexpression is a pointer to a complete type, although since the
> point is to assert that types are compatible that can generally be
> arranged.
> 
> But it's a kludge that's limited in scope to the specific place where
> we want to assert that two expressions have compatible types, and it's
> a kludge that works for real compilers and for static analyzers like
> coverity, and it has no adverse consequences _except_ the false alarms
> in ubsan.
> 

Alarm is true, but as of today not breaking in the toolchains used today.

> So if we're going to find a replacement kludge, I'd like to keep it
> localized at the point where want to assert type compatibility, and
> make sure it works with all our compilers and coverity.  Maybe we
> could find a way to generically write
> 
> _ASSERT_TYPES_COMPATIBLE(P, Q, NEXT)
> 
> so that it expands into an expression equivalent to NEXT but as a side
> effect at compile-time causes an error if P and Q have incompatible
> types, and similarly with _ASSERT_PTR_MEMBER_TYPE(P, T, F, NEXT) or
> something to assert that the member T::F has a type compatible with
> *P.
> 


FreeBSD uses this code (untested, but containerof() is generally tricky
feature).

/*
 * Given the pointer x to the member m of the struct s, return
 * a pointer to the containing structure.  When using GCC, we first
 * assign pointer x to a local variable, to check that its type is
 * compatible with member m.
 */
#if __GNUC_PREREQ__(3, 1)
#define __containerof(x, s, m) ({                                       \
        const volatile __typeof(((s *)0)->m) *__x = (x);                \
        __DEQUALIFY(s *, (const volatile char *)__x - __offsetof(s, m));\
})
#else
#define __containerof(x, s, m)                                          \
        __DEQUALIFY(s *, (const volatile char *)(x) - __offsetof(s, m))
#endif

-- sys/sys/cdefs.h
Attachment: signature.asc
Description: OpenPGP digital signature
Follow-Ups:
- Re: Avoid UB in pslist.h (NULL + 0)
  - From: Taylor R Campbell
References:
- Re: Avoid UB in pslist.h (NULL + 0)
  - From: Taylor R Campbell
Prev by Date: Re: Avoid UB in pslist.h (NULL + 0)
Next by Date: GSoC 2020 - The NetBSD Foundation - Emulating linux timer.. syscalls
Previous by Thread: Re: Avoid UB in pslist.h (NULL + 0)
Next by Thread: Re: Avoid UB in pslist.h (NULL + 0)
Indexes:
Home | Main Index | Thread Index | Old Index