tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: using the interfaces in ctype.h

On 21-Apr-08, at 5:56 AM, Alan Barrett wrote:

On Sun, 20 Apr 2008, Greg A. Woods; Planix, Inc. wrote:
Indeed!  Some implementations were so lame they didn't include the
mask in the implementation of the macro!

If the implementation masked the value before using it, then it would be
unable to distinguish EOF from UCHAR_MAX (typically '\377').

Indeed, however the current implementation doesn't even try to "detect" or "distinguish" EOF, and indeed passing EOF without casting it properly and/or masking will result in an out-of-bounds array access in the current implementation.

the onus falls on the caller to ensure that they don't pass invalid
values; otherwise the implementation is allowed to do anything at all.

I certainly agree as a matter of principle and for the most minimal correctness.

However I would hope that it is in the best interests of NetBSD to provide a _safe_ implementation, not just a minimally correct one. I'm not sure what the implications of an out-of-bounds array access are in most of these cases, though apparently it can sometimes cause at least an unexpected abort, and in general I would call that "unsafe".

Simple testing (example code upon request) shows that proper masking (or proper casting) of the macro parameter within its use as an array index will prevent any out-of-bounds access.

Meanwhile the compile-time warnings introduced by the current "do nothing special" implementation are useless (i.e. not triggered) whenever the parameter is a signed integer (even "signed char") which could have the value "-1" and thus trigger an out-of-bounds array access.

Oh oh, oops, the NetBSD implementations don't seem to include the mask
either!  I didn't realized that!  So sad.  (Which may even mean they
violate the standards implication that they be able to safely accept
the value of EOF.

Huh?  The NETBSD implementations accept EOF.

Ok, yes, but only with "undefined" behaviour.

Since masking inside the
implementation would violate the requirement to distinguish EOF from
UCHAR_MAX, it's good that NetBSD doesn't do that.

Huh?  That makes no sense whatsoever.

What do you think the current NetBSD implementation does when given EOF anyway? How about when it's given a signed integer variable that has been assigned the value of EOF? How about any other negative number which some user might have thought to be a useful way of extending the error reporting possible in such a situation?

FreeBSD, OpenBSD, and Darwin all seem to have much better
implementations, though they are all using proper (inline) functions
which makes it easier in some ways to do it right.)

I am mildly curious.  In what way are they "better"?

Well they can't as easily be responsible for causing a program to crash, for example.

With GCC on any NetBSD architecture there are only two correct ways to access an array of fewer than UINT_MAX bytes (eg. one of UCHAR_MAX bytes) using a macro:

        int ctype[UCHAR_MAX] = { 0 };

        #define _ctype(i)       ctype[((i) & 0xFF)]

        #define _ctype(i)       ctype[(unsigned char) i]

I recommend the following slightly more portable technique for ctype.h:

        #define _CTYPE_MASK     ~(UINT_MAX << CHAR_BIT)

        #define isdigit(c)      ((int)(_ctype_ + 1)[((c) & _CTYPE_MASK)] & _N))

The only problem here is the slightly confusing warning (if warnings are enabled) (though not much more confusing than the current one given for parameters of type "char") when a negative integer constant (such as EOF) is explicitly passed to one of these macros.

On the other hand there may be some merit in adopting the OpenBSD or FreeBSD implementations, though I don't yet understand their implications in face of the NetBSD way of doing wchar_t et al.

                                        Greg A. Woods; Planix, Inc.

Home | Main Index | Thread Index | Old Index