tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: curses vs non-ASCII

>>>> [...] non-ASCII octets in strings are getting [] lost [...]
>>> Have you called setlocale(3) appropiately?
>> No.  I was not calling setlocale() at all.  [...]
> In that case, it was using the "C" standard locale, which is using
> ASCII only, no surprise.

Well, it surprised me.  It is a regression as compared to 1.4T and
4.0.1, each of which works exactly the way I want here, with 0x80
through 0xff all being treated as printables.  (Test program below.)

> This is not about multibyte locales even.  You have quoted the
> relevant part of the mbrtowc man page even...

I think the only part of mbrtowc(3) I've quoted in this thread is

     The behaviour of mbrtowc() is affected by the LC_CTYPE category of the
     current locale.

which, while possibly relevant, is not obviously relevant; it is not at
all clear to me from that that mbrtowc will drop octets corresponding
to non-printing characters - and, indeed, I'm not sure it is; the issue
may be in curses, which does something special for (wide) characters
when wcwidth returns zero.  (I haven't followed the code enough to
really understand just _how_ the behaviour is different.)

However, I do have reason to think mbrtowc is part of the precipitate
here; specifically, by default, the 5.2 curses not only drops the
non-ASCII octet, but sometimes eats the following octet as well.  Even
without any setlocale().

>> (I haven't tested whether it gives me the rest of what I want, which
>> is the 0x80-0x9f octets also being treated as single-octet
>> printables.)
> But they are not printable in ISO 8859-1, they are control
> characters.

That's right.  I'm not trying to use straight-up 8859-1.  What I'm
trying to use here is a superset of 8859-1, one in which 0x80 through
0xff are all printable - basically, the behaviour 1.4T and 4.0.1 (and
presumably everything in between) give me by default.  8859-1 is, I
would say, mostly acceptable but not ideal.

>> [...] consider all of 0x20-0xff as printable, [...]
> Most locales have at least 0x7f as control character...

Doh!  My sloppiness.  Yes, 0x20-0x7e and 0x80-0xff.

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Here's that test program, in case anyone cares.

#include <curses.h>

int main(void)

(Of course, you need a display device which treats all those octets as
printable in order for this to be useful.  I'm using a terminal
emulator that can be configured that way for that part.)

Home | Main Index | Thread Index | Old Index