NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: toupper and warnings



Johnny Billquist <bqt%update.uu.se@localhost> writes:

> On 2021-05-06 13:06, Greg Troxel wrote:
>>
>> Johnny Billquist <bqt%update.uu.se@localhost> writes:
>>
>>>> See CAVEATS in ctype(3).
>>>
>>> Right. But is gcc really smart enough to understand at compile time if
>>> something else than -1 is the negative value, and that toupper in fact
>>> is more limited than what the signature says?
>>>
>>> The *signature* of the function is int toupper(int). If you pass a
>>> char to that, I can't see that there would ever be a warning about any
>>> problems.
>>
>> The signature is that it takes an int, but the specification is that if
>> the value of the int is other than EOF or something representable as
>> unsigned char (projecting to he implementation, meaning -1 is ok and
>> 0..255 is ok), then you get UB.
>
> Right. My question is just how on earth gcc would know this? There is
> nothing in the actual declaration that tells (or even can) tell
> this. So this would then (again) be an example of gcc knowing more
> about the function than is actually visible.

What is going on is that there is a header file that defines toupper as
a macro (#define).    After expanding that, gcc sees code that is using
a char (which is signed) as an array subscript, which can reasonably be
expected to have the possibility of out-of-bounds reads.  gcc is AFAIK
not bringing knowledge of toupper.

> What if I wrote my own function called toupper, which was defined for
> the full range of an int? Would gcc then understand that this is a
> different toupper that it shouldn't warn about?

toppper is specified by C99.  So yes, you could implement a version that
had safer behavior when it is formally undefined.  If gcc had the
specification expressed somehow, and could do static analysis, and gave
a warning that "call to toupper could lead to udefined behavior", then I
think that would be great.  But it isn't doing that -- and that's more
like UBsan than a compiler anyway.

A lot of trouble is caused by people writing code  that's ok with the
implementation in front of them, but that ventures into UB per the
standard.   So I don't think such code should be accomodated in general.

Attachment: signature.asc
Description: PGP signature



Home | Main Index | Thread Index | Old Index