tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: curses vs non-ASCII



>>>> [...] non-ASCII octets in strings are getting [] lost [...]
>>> Have you called setlocale(3) appropiately?
>> No.  I was not calling setlocale() at all.  [...]
> In that case, it was using the "C" standard locale, which is using
> ASCII only, no surprise.

Well, it surprised me.  It is a regression as compared to 1.4T and
4.0.1, each of which works exactly the way I want here, with 0x80
through 0xff all being treated as printables.  (Test program below.)

> This is not about multibyte locales even.  You have quoted the
> relevant part of the mbrtowc man page even...

I think the only part of mbrtowc(3) I've quoted in this thread is

     The behaviour of mbrtowc() is affected by the LC_CTYPE category of the
     current locale.

which, while possibly relevant, is not obviously relevant; it is not at
all clear to me from that that mbrtowc will drop octets corresponding
to non-printing characters - and, indeed, I'm not sure it is; the issue
may be in curses, which does something special for (wide) characters
when wcwidth returns zero.  (I haven't followed the code enough to
really understand just _how_ the behaviour is different.)

However, I do have reason to think mbrtowc is part of the precipitate
here; specifically, by default, the 5.2 curses not only drops the
non-ASCII octet, but sometimes eats the following octet as well.  Even
without any setlocale().

>> (I haven't tested whether it gives me the rest of what I want, which
>> is the 0x80-0x9f octets also being treated as single-octet
>> printables.)
> But they are not printable in ISO 8859-1, they are control
> characters.

That's right.  I'm not trying to use straight-up 8859-1.  What I'm
trying to use here is a superset of 8859-1, one in which 0x80 through
0xff are all printable - basically, the behaviour 1.4T and 4.0.1 (and
presumably everything in between) give me by default.  8859-1 is, I
would say, mostly acceptable but not ideal.

>> [...] consider all of 0x20-0xff as printable, [...]
> Most locales have at least 0x7f as control character...

Doh!  My sloppiness.  Yes, 0x20-0x7e and 0x80-0xff.

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse%rodents-montreal.org@localhost
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Here's that test program, in case anyone cares.

#include <curses.h>

int main(void)
{
 initscr();
 noecho();
 cbreak();
 clearok(stdscr,TRUE);
 move(10,0);
 addstr(
"\200\201\202\203\204\205\206\207\210\211\212\213\214\215\216\217"
"\220\221\222\223\224\225\226\227\230\231\232\233\234\235\236\237"
"\240\241\242\243\244\245\246\247\250\251\252\253\254\255\256\257"
"\260\261\262\263\264\265\266\267\270\271\272\273\274\275\276\277"
"\300\301\302\303\304\305\306\307\310\311\312\313\314\315\316\317"
"\320\321\322\323\324\325\326\327\330\331\332\333\334\335\336\337"
"\340\341\342\343\344\345\346\347\350\351\352\353\354\355\356\357"
"\360\361\362\363\364\365\366\367\370\371\372\373\374\375\376\377");
 refresh();
 endwin();
 return(0);
}

(Of course, you need a display device which treats all those octets as
printable in order for this to be useful.  I'm using a terminal
emulator that can be configured that way for that part.)


Home | Main Index | Thread Index | Old Index