tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: UTF-8 capable fmt(1)
> - It's unclear what is supposed to happen if you can isspace() with
> values greater than 127; POSIX says isspace() takes "characters".
Historically, NetBSD seems to have gone a little further than that:
CAVEATS
The argument to isspace() must be EOF or representable as an unsigned
char; otherwise, the behavior is undefined.
As for the proposed change, my reaction to it is not very helpful. It
amounts to "I don't like it, but I can tolerate it as long I can easily
set something to make it go away" - I note the proposal did not explain
what happens to people who want to use something other than UTF-8.
(For me, most often I want "octets map 1-to-1 to characters", usually
for either 8859-1 or a character set like 8859-1 but extended with
printable characters in the 0x80-0x9f positions.)
Personally, I think that variable-size characters are an unmitigated
lose, so I have no liking for UTF-8 anything, especially when (as is
all too often the case) it breaks things for other uses. (I've seen at
least one Linux tool that, under circumstances I haven't bothered
probing the envelope of in detail but I suspect relate to octet
sequences that are invalid UTF-8, ignores the rest of its input(!)
silently(!!).)
/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML mouse%rodents-montreal.org@localhost
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Home |
Main Index |
Thread Index |
Old Index