tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: curses vs non-ASCII
>>>> [...] non-ASCII octets in strings are getting [] lost [...]
>>> Have you called setlocale(3) appropiately?
>> No. I was not calling setlocale() at all. [...]
> In that case, it was using the "C" standard locale, which is using
> ASCII only, no surprise.
Well, it surprised me. It is a regression as compared to 1.4T and
4.0.1, each of which works exactly the way I want here, with 0x80
through 0xff all being treated as printables. (Test program below.)
> This is not about multibyte locales even. You have quoted the
> relevant part of the mbrtowc man page even...
I think the only part of mbrtowc(3) I've quoted in this thread is
The behaviour of mbrtowc() is affected by the LC_CTYPE category of the
current locale.
which, while possibly relevant, is not obviously relevant; it is not at
all clear to me from that that mbrtowc will drop octets corresponding
to non-printing characters - and, indeed, I'm not sure it is; the issue
may be in curses, which does something special for (wide) characters
when wcwidth returns zero. (I haven't followed the code enough to
really understand just _how_ the behaviour is different.)
However, I do have reason to think mbrtowc is part of the precipitate
here; specifically, by default, the 5.2 curses not only drops the
non-ASCII octet, but sometimes eats the following octet as well. Even
without any setlocale().
>> (I haven't tested whether it gives me the rest of what I want, which
>> is the 0x80-0x9f octets also being treated as single-octet
>> printables.)
> But they are not printable in ISO 8859-1, they are control
> characters.
That's right. I'm not trying to use straight-up 8859-1. What I'm
trying to use here is a superset of 8859-1, one in which 0x80 through
0xff are all printable - basically, the behaviour 1.4T and 4.0.1 (and
presumably everything in between) give me by default. 8859-1 is, I
would say, mostly acceptable but not ideal.
>> [...] consider all of 0x20-0xff as printable, [...]
> Most locales have at least 0x7f as control character...
Doh! My sloppiness. Yes, 0x20-0x7e and 0x80-0xff.
/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML mouse%rodents-montreal.org@localhost
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Here's that test program, in case anyone cares.
#include <curses.h>
int main(void)
{
initscr();
noecho();
cbreak();
clearok(stdscr,TRUE);
move(10,0);
addstr(
"\200\201\202\203\204\205\206\207\210\211\212\213\214\215\216\217"
"\220\221\222\223\224\225\226\227\230\231\232\233\234\235\236\237"
"\240\241\242\243\244\245\246\247\250\251\252\253\254\255\256\257"
"\260\261\262\263\264\265\266\267\270\271\272\273\274\275\276\277"
"\300\301\302\303\304\305\306\307\310\311\312\313\314\315\316\317"
"\320\321\322\323\324\325\326\327\330\331\332\333\334\335\336\337"
"\340\341\342\343\344\345\346\347\350\351\352\353\354\355\356\357"
"\360\361\362\363\364\365\366\367\370\371\372\373\374\375\376\377");
refresh();
endwin();
return(0);
}
(Of course, you need a display device which treats all those octets as
printable in order for this to be useful. I'm using a terminal
emulator that can be configured that way for that part.)
Home |
Main Index |
Thread Index |
Old Index