tech-userlevel archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: curses vs non-ASCII
On Thu, Nov 19, 2015 at 12:03:38PM -0500, Mouse wrote:
> >>>> [...] non-ASCII octets in strings are getting [] lost [...]
> >>> Have you called setlocale(3) appropiately?
> >> No. I was not calling setlocale() at all. [...]
> > In that case, it was using the "C" standard locale, which is using
> > ASCII only, no surprise.
>
> Well, it surprised me. It is a regression as compared to 1.4T and
> 4.0.1, each of which works exactly the way I want here, with 0x80
> through 0xff all being treated as printables. (Test program below.)
Well, that would be a serious bug and I am somewhat surprised if
netbsd-4 really behaved like that -- it doesn't match any of the classic
character sets.
> > This is not about multibyte locales even. You have quoted the
> > relevant part of the mbrtowc man page even...
>
> I think the only part of mbrtowc(3) I've quoted in this thread is
>
> The behaviour of mbrtowc() is affected by the LC_CTYPE category of the
> current locale.
>
> which, while possibly relevant, is not obviously relevant; it is not at
> all clear to me from that that mbrtowc will drop octets corresponding
> to non-printing characters - and, indeed, I'm not sure it is; the issue
> may be in curses, which does something special for (wide) characters
> when wcwidth returns zero. (I haven't followed the code enough to
> really understand just _how_ the behaviour is different.)
mbrtowc does not drop non-printinig characters. At most it rejects input
with high bit set as invalid for US-ASCII. That's a completely different
issue. I don't see anything in curses(3) that cares about printable
either.
> However, I do have reason to think mbrtowc is part of the precipitate
> here; specifically, by default, the 5.2 curses not only drops the
> non-ASCII octet, but sometimes eats the following octet as well. Even
> without any setlocale().
Test case, please.
> >> (I haven't tested whether it gives me the rest of what I want, which
> >> is the 0x80-0x9f octets also being treated as single-octet
> >> printables.)
> > But they are not printable in ISO 8859-1, they are control
> > characters.
>
> That's right. I'm not trying to use straight-up 8859-1. What I'm
> trying to use here is a superset of 8859-1, one in which 0x80 through
> 0xff are all printable - basically, the behaviour 1.4T and 4.0.1 (and
> presumably everything in between) give me by default. 8859-1 is, I
> would say, mostly acceptable but not ideal.
I still don't get why you wouold want that. They have no graphical
representation.
Joerg
Home |
Main Index |
Thread Index |
Old Index