[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: curses vs non-ASCII
>>>> [...] non-ASCII octets in strings are getting  lost [...]
>>> Have you called setlocale(3) appropiately?
>> No. I was not calling setlocale() at all. [...]
> In that case, it was using the "C" standard locale, which is using
> ASCII only, no surprise.
Well, it surprised me. It is a regression as compared to 1.4T and
4.0.1, each of which works exactly the way I want here, with 0x80
through 0xff all being treated as printables. (Test program below.)
> This is not about multibyte locales even. You have quoted the
> relevant part of the mbrtowc man page even...
I think the only part of mbrtowc(3) I've quoted in this thread is
The behaviour of mbrtowc() is affected by the LC_CTYPE category of the
which, while possibly relevant, is not obviously relevant; it is not at
all clear to me from that that mbrtowc will drop octets corresponding
to non-printing characters - and, indeed, I'm not sure it is; the issue
may be in curses, which does something special for (wide) characters
when wcwidth returns zero. (I haven't followed the code enough to
really understand just _how_ the behaviour is different.)
However, I do have reason to think mbrtowc is part of the precipitate
here; specifically, by default, the 5.2 curses not only drops the
non-ASCII octet, but sometimes eats the following octet as well. Even
without any setlocale().
>> (I haven't tested whether it gives me the rest of what I want, which
>> is the 0x80-0x9f octets also being treated as single-octet
> But they are not printable in ISO 8859-1, they are control
That's right. I'm not trying to use straight-up 8859-1. What I'm
trying to use here is a superset of 8859-1, one in which 0x80 through
0xff are all printable - basically, the behaviour 1.4T and 4.0.1 (and
presumably everything in between) give me by default. 8859-1 is, I
would say, mostly acceptable but not ideal.
>> [...] consider all of 0x20-0xff as printable, [...]
> Most locales have at least 0x7f as control character...
Doh! My sloppiness. Yes, 0x20-0x7e and 0x80-0xff.
/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML mouse%rodents-montreal.org@localhost
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Here's that test program, in case anyone cares.
(Of course, you need a display device which treats all those octets as
printable in order for this to be useful. I'm using a terminal
emulator that can be configured that way for that part.)
Main Index |
Thread Index |