[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: using the interfaces in ctype.h
On 22-Apr-08, at 3:58 AM, Alan Barrett wrote:
On Mon, 21 Apr 2008, Greg A. Woods; Planix, Inc. wrote:
Besides, the standards don't, so far as I can tell, require
implementations to always return zero for all the is*() APIs when EOF
is passed to them. This whole "the mask prevents the implementation
from distinguishing between 0xFF and EOF" claim is completely bogus.
It just doesn't matter what these functions return when passed EOF --
their result in that case is undefined anyway.
Since you are the only person who appears to believe the above claims,
please justify them with detailed references to sections in the
On further reading I'll retract that claim. Sorry I was a little bit
out of line in what I said there before I had fully considered the
implications of LC_CTYPE support and the implications of ISO-8859.
I do now seem to remember thinking a very long time ago when I first
encountered ISO-8859 that there were going to be problems for some
implementations due to the unfortunate use of the 0xFF code for a
Unfortunately it seems the C99 standard has particularly obtuse
wording that dances around the subject. I can't find anything
definitive without quoting massive amounts of diversely spread
references, but then again the standard is so careful to avoid saying
anything specific about any implementation details that it even dances
around defining EOF such that under their definition it can be pretty
much any negative integer value if I'm reading things right (eg. in
The Single UNIX Standard is similarly waffling about EOF, but it does
at least imply that the is*() APIs should return zero for anything
other than the type of thing they're supposed to be matching and thus
implying that, since EOF is by definition never anything that can be
matched as a valid character of any type, they must all return zero
when passed EOF. Using a mask in the way I suggested works fine for
ASCII of course as well as for some of the ISO-8859 charsets, but
unfortunately not all of them, and not even .1.
Now I think I better understand the #if USE_ASCII conditional code in
Darwin's ctype.h. I like this aspect of the Darwin implementation the
best I think. It provides the very best possible performance for the
simple ASCII-only case, and then jumps right into using functions
(inline where possible) for anything beyond ASCII. There's still a
short-circuit in the core function used by some of the API functions
where ASCII-only values are handled by a direct array access, thus
avoiding a second function call.
The OpenBSD implementation is probably second best, though by far it
is the most readable and easiest to understand. It uses simple in-
inline function calls to avoid the issue of multiple references to the
macro argument when testing to see if the value is EOF before masking
it and using the masked value as the array index. On first glance I
think the OpenBSD implementation also has the advantage of being 100%
compatible with all the other innards of NetBSD's libc and so it's
probably the easiest one to borrow, i.e. should NetBSD also decide
that it's better to be safe and prevent out-of-bounds array accesses.
(The current OpenBSD implementation may still return bogus values for
some other negative numbers though.) Perhaps I'll slurp in their code
as a starting base in my tree and see how it performs.
Greg A. Woods; Planix, Inc.
Main Index |
Thread Index |