Re: curses vs non-ASCII

To: tech-userlevel%NetBSD.org@localhost
Subject: Re: curses vs non-ASCII
From: Joerg Sonnenberger <joerg%britannica.bec.de@localhost>
Date: Thu, 19 Nov 2015 21:09:43 +0100

On Thu, Nov 19, 2015 at 12:03:38PM -0500, Mouse wrote:
> >>>> [...] non-ASCII octets in strings are getting [] lost [...]
> >>> Have you called setlocale(3) appropiately?
> >> No.  I was not calling setlocale() at all.  [...]
> > In that case, it was using the "C" standard locale, which is using
> > ASCII only, no surprise.
> 
> Well, it surprised me.  It is a regression as compared to 1.4T and
> 4.0.1, each of which works exactly the way I want here, with 0x80
> through 0xff all being treated as printables.  (Test program below.)

Well, that would be a serious bug and I am somewhat surprised if
netbsd-4 really behaved like that -- it doesn't match any of the classic
character sets.

> > This is not about multibyte locales even.  You have quoted the
> > relevant part of the mbrtowc man page even...
> 
> I think the only part of mbrtowc(3) I've quoted in this thread is
> 
>      The behaviour of mbrtowc() is affected by the LC_CTYPE category of the
>      current locale.
> 
> which, while possibly relevant, is not obviously relevant; it is not at
> all clear to me from that that mbrtowc will drop octets corresponding
> to non-printing characters - and, indeed, I'm not sure it is; the issue
> may be in curses, which does something special for (wide) characters
> when wcwidth returns zero.  (I haven't followed the code enough to
> really understand just _how_ the behaviour is different.)

mbrtowc does not drop non-printinig characters. At most it rejects input
with high bit set as invalid for US-ASCII. That's a completely different
issue. I don't see anything in curses(3) that cares about printable
either.

> However, I do have reason to think mbrtowc is part of the precipitate
> here; specifically, by default, the 5.2 curses not only drops the
> non-ASCII octet, but sometimes eats the following octet as well.  Even
> without any setlocale().

Test case, please.

> >> (I haven't tested whether it gives me the rest of what I want, which
> >> is the 0x80-0x9f octets also being treated as single-octet
> >> printables.)
> > But they are not printable in ISO 8859-1, they are control
> > characters.
> 
> That's right.  I'm not trying to use straight-up 8859-1.  What I'm
> trying to use here is a superset of 8859-1, one in which 0x80 through
> 0xff are all printable - basically, the behaviour 1.4T and 4.0.1 (and
> presumably everything in between) give me by default.  8859-1 is, I
> would say, mostly acceptable but not ideal.

I still don't get why you wouold want that. They have no graphical
representation.

Joerg

Follow-Ups:
- Re: curses vs non-ASCII
  - From: Mouse

References:
- curses vs non-ASCII
  - From: Mouse
- Re: curses vs non-ASCII
  - From: Joerg Sonnenberger
- Re: curses vs non-ASCII
  - From: Mouse
- Re: curses vs non-ASCII
  - From: Joerg Sonnenberger
- Re: curses vs non-ASCII
  - From: Mouse

Prev by Date: Re: curses vs non-ASCII
Next by Date: Re: curses vs non-ASCII
Previous by Thread: Re: curses vs non-ASCII
Next by Thread: Re: curses vs non-ASCII
Indexes:

Home | Main Index | Thread Index | Old Index