tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: using the interfaces in ctype.h




On 21-Apr-08, at 2:07 PM, der Mouse wrote:
That makes about as much sense as saying that getc assuming its FILE *
parameter is non-nil is not safe, or that mktemp is not safe when
passed (void *)&main.  Large swaths of libc will do odd things when
passed arguments beyond their interface specifications; it is no part
of libc's mandate to detect such calls.  I'm not convinced "not safe"
is really a fair term to apply to such behaviour.  I certainly don't
see it as worth significant effort to do anything in particular in such
cases.

The problem here, I think, is that it's an array access that's for all intents and purposes done within the application, not within libc. This makes compiler warnings about it, and perhaps crashes caused by it, somewhat more confusing for at least some applications programmers.

Witness the start of this thread where it was suggested that the caller of the "ctype" APIs kill the ability of the compiler to detect incorrect usage while at the same time preventing the ability of the implementation to distinguish between EOF and 0xFF.

I don't think anyone has suggested application-level casting for cases
where the argument might be EOF; that would be severely broken - which
is why the implementation, which must be prepared to handle EOF
arguments, should not do it (nor anything equivalent).

I take it that such a suggestion was exactly the case: "cast ctype arguments to unsigned char and not int"

Either way you look at it, (and no matter where it is), the cast will make EOF==0xFF.

I'm saying that it doesn't matter and that it's better to have the implementation provide an implementation-specific and correct mask instead of relying on the application programmer to get it right. The compiler warning everyone wants doesn't really help all that much (50% or less of cases I'd say), and forcing application programmers to do the cast means it's just as likely going to be wrong and/or useless. Application programmers should be strongly encouraged to use the API as it is defined and they should not be encouraged to do funny things just to shut up the compiler. If the cast is done within the implementation then the macro looks and behaves much more like a full- fledged function implementation would (or should), even if it still gets it wrong when the programmer does end up passing EOF by mistake. Besides, the standards don't, so far as I can tell, require implementations to always return zero for all the is*() APIs when EOF is passed to them. This whole "the mask prevents the implementation from distinguishing between 0xFF and EOF" claim is completely bogus. It just doesn't matter what these functions return when passed EOF -- their result in that case is undefined anyway. They are only required to accept EOF because it is and was common practice to directly pass the un-modified result of something like getc() to them. The program is going to spin in a loop if it doesn't detect EOF anyway, regardless of whether the programmer casts the value to (unsigned char) or not and regardless of whether the implementation masks it to prevent an out-of-bounds array access.

I guess the on additional point I should make is that the use of "(_ctype_ + 1)" as the start of the array can be done away with if the index is masked by ~(~0 << CHAR_BIT). That might even speed up these macros, at least on machines which can do the mask faster than they can add one to a pointer. :-)

--
                                        Greg A. Woods; Planix, Inc.
                                        <woods%planix.ca@localhost>



Home | Main Index | Thread Index | Old Index