At Thu, 06 May 2021 07:06:43 -0400, Greg Troxel <gdt%lexort.com@localhost> wrote:
Subject: Re: toupper and warnings
>
> Johnny Billquist <bqt%update.uu.se@localhost> writes:
>
> >> See CAVEATS in ctype(3).
> >
> > Right. But is gcc really smart enough to understand at compile time if
> > something else than -1 is the negative value, and that toupper in fact
> > is more limited than what the signature says?
> >
> > The *signature* of the function is int toupper(int). If you pass a
> > char to that, I can't see that there would ever be a warning about any
> > problems.
Well the warning is actually an almost accidental side effect due to the
implementation as a macro and due to the way the particular way the
macro is expanded into an array reference.  The potential to access an
array with a negative index is obvious to the compiler in the current
standard NetBSD implementation.
Note too that actually implementing some of the ctype.h style interfaces
as real functions on a system using signed chars could be problematic as
they could then not internally distinguish between -1 and 0xFF when
passed a plain un-cast signed char value (a value of 0xFF is of course a
valid character, but due to the default integer promotions for all
function parameters the sign will be extended and the result will be
(int) -1).  This matters for iscntrl(), isalpha(), and potentially also
toupper() and tolower() depending on exactly how the return value is
used.
> To answer Greg Woods, It is not "conservative" to silently accept
> UB-causing inputs and turn them into UB array reads.
Well I don't actually do that.
Which is to say my implementation's goal is to turn them into properly
promoted values and access the flags array in such a way that this does
not and cannot "cause" UB.  My implementation carefully tests for a -1
and then if it is not -1 it uses a cast and assignment to an unsigned
char, the value of which is then used to access the array.  I.e. totally
safe _and_ UB-free.
Thus I take the conservative approach of not causing anything bad or
unexpected to happen, including not causing UBSAN to abort the program.
The only bad thing is that I do is to silently coerce truly badly
behaving code into continuing to work without complaint, even from
UBSAN.
My test code for my implementation is here:
    https://github.com/robohack/experiments/blob/master/tctype.c
So far it's only been tested with a Clang and a range of GCC versions.
I can send the patch that adds it to -current too if you'd like to see
it in-situ.
> Another approach would be for NetBSD to adust the code to bounds check
> before indexing and do something if the value is out of bounds.
Yes, and my implementation could also easily be adjusted to do that.
>   Options
> are returning false, returning a random bit, calling abort(3), trying to
> send all of the user's files to a random IP addreess, and so on.  All
> are allowed by the spec :-)
Yeah, "Undefined Behaviour" should be undefined -- i.e. removed from the
spec -- i.e. become either fully defined or at least implementation
defined.  It is not helpful at all -- it was a very VERY bad idea.
E.g. for ctype.h interfaces the spec should just say that values outside
the recognized range will simply be truncated as if by assignment to an
unsigned char.
> Probably returning false with an envvar option to abort is best, similar
> to how UB from mutex ops is handled.
On the other hand perhaps this would better be a job for the UBSAN
module to do, but perhaps that's just because I've become much more
comfortable with using UBSAN with Clang in recent years.  SIGILL for the
losers.  I'm not sure how my implementation could be so hooked into
UBSAN tests though, especially in a less compiler-dependent way.
>   When we started with abort it
> seemed there was a vast amount of incorrect code written and debugged on
> other operating systems that allow code specified to hav UB.
I haven't had quite that experience, though there have been some rare
but stunning examples of problematic code -- though all I can remember
were actually caught while porting the code to run on NetBSD/alpha, or
in a few cases when porting to platforms that actually SIGBUS on
unaligned accesses (like sparc IIRC?).
What I am pretty sure of though is that there's a vast difference
between the massive number of warnings spit out by the compiler vs. the
relatively low number of actual cases of passing values outside of -1..255.
We certainly wouldn't want to claim UB and abort for all of the warnings!
--
					Greg A. Woods <gwoods%acm.org@localhost>
Kelowna, BC     +1 250 762-7675           RoboHack <woods%robohack.ca@localhost>
Planix, Inc. <woods%planix.com@localhost>     Avoncote Farms <woods%avoncote.ca@localhost>
Attachment:
pgpka1R3YO42t.pgp
Description: OpenPGP Digital Signature