Subject: Re: Problems with libedit and international characters
To: None <netbsd-users@netbsd.org>
From: Christos Zoulas <christos@astron.com>
List: netbsd-users
Date: 09/23/2007 15:16:43
In article <Pine.LNX.4.64.0709231408380.12828@top.qwarx.com>,
Chris Wilson  <chris@qwirx.com> wrote:
>Hi all,
>
>[Please note: this message was just posted to the glibc locales list. 
>However I think it relates to libedit as well and would be interested to 
>speak to anyone who maintains it. I don't know whether it actually works 
>properly for international characters on BSD, perhaps someone could check? 
>Thanks!]
>
>I'm new to locales and struggling to fix a bug with libedit/editline (a 
>BSD-licensed readline library from NetBSD) running on glibc (this gives 
>legal readline support for BSD-licensed programs on Linux).
>
>editline works fine for most things, but international characters cannot 
>be entered (e.g. U00E6, &aelig, from Danish). I think I tracked it down to 
>a problem with isprint() in glibc. editline makes a map of insertable 
>characters from only the printable characters using isprint(), and glibc 
>tells it that 0xe6 is not printable in my locale (en_GB). It doesn't 
>recognise 0xe6 as being a command character either, so it beeps the 
>terminal and ignores it.
>
>I wrote a quick test program to verify that isprint(0xe6) and 
>iswprint(0xe6) both return zero in en_GB (on glibc). I have yet to find a 
>locale where it is printable. It's a Danish character, but da_DK does not 
>make it printable either. In my view, it displays fine on my terminal and 
>I think it should be printable.
>
>I'm not sure whether editline is doing the right thing here; I think it will 
>have problems with UTF-8 and multibyte character sets, because it reads 
>characters from the terminal one byte at a time, and checks them bytewise for 
>printability. It should probably convert the input to Unicode and check Unicode 
>characters for printability with iswprint().

Yes, it will have a problem with UTF-8 and multi-byte character sets. It
is not too diffcutly to fix though. I have not bothered because there is
not enough demand. If you want to do it though, you can look at the tcsh
code (where the editline code was created from) and copy the changes.

>
>However, in my locale characters are one byte long, so this would have no 
>effect and iswprint(0x000000e6) returns exactly the same as isprint(0xe6), i.e. 
>zero. This looks like a bug in the glibc locales to me.

Yes, they should work properly, read on...

>
>I can't find out where the bug is, though. My /usr/share/i18n/locales/en_GB 
>lists the following:
>
>[... snip glibc locale tables ...]
>
>Here is my test program:
>
>chris@gcc(tmp)$ cat test.c
>#include <stdio.h>
>
>int main()
>{
>          int i;
>
>          for (i = 0200; i <= 0xff; i++)
>          {
>                  printf("%d", isprint(i));
>          }
>          puts("");
>          return 0;
>}

You need to call setlocale(); this works:

#include <stdio.h>
#include <locale.h>

int main()
{
          int i;
	  setlocale(LC_ALL, "");

          for (i = 0200; i <= 0xff; i++)
          {
                  printf("%d", isprint(i));
          }
          puts("");
          return 0;
}