tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: UTF-8 capable fmt(1)



> - It's unclear what is supposed to happen if you can isspace() with
>   values greater than 127; POSIX says isspace() takes "characters".

Historically, NetBSD seems to have gone a little further than that:

CAVEATS
     The argument to isspace() must be EOF or representable as an unsigned
     char; otherwise, the behavior is undefined.

As for the proposed change, my reaction to it is not very helpful.  It
amounts to "I don't like it, but I can tolerate it as long I can easily
set something to make it go away" - I note the proposal did not explain
what happens to people who want to use something other than UTF-8.
(For me, most often I want "octets map 1-to-1 to characters", usually
for either 8859-1 or a character set like 8859-1 but extended with
printable characters in the 0x80-0x9f positions.)

Personally, I think that variable-size characters are an unmitigated
lose, so I have no liking for UTF-8 anything, especially when (as is
all too often the case) it breaks things for other uses.  (I've seen at
least one Linux tool that, under circumstances I haven't bothered
probing the envelope of in detail but I suspect relate to octet
sequences that are invalid UTF-8, ignores the rest of its input(!)
silently(!!).)

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse%rodents-montreal.org@localhost
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Home | Main Index | Thread Index | Old Index