Subject: Problems with libedit and international characters
To: None <jess@thrysoee.dk, netbsd-users@netbsd.org>
From: Chris Wilson <chris@qwirx.com>
List: netbsd-users
Date: 09/23/2007 15:58:59
Hi all,

[Please note: this message was just posted to the glibc locales list. 
However I think it relates to libedit as well and would be interested to 
speak to anyone who maintains it. I don't know whether it actually works 
properly for international characters on BSD, perhaps someone could check? 
Thanks!]

I'm new to locales and struggling to fix a bug with libedit/editline (a 
BSD-licensed readline library from NetBSD) running on glibc (this gives 
legal readline support for BSD-licensed programs on Linux).

editline works fine for most things, but international characters cannot 
be entered (e.g. U00E6, &aelig, from Danish). I think I tracked it down to 
a problem with isprint() in glibc. editline makes a map of insertable 
characters from only the printable characters using isprint(), and glibc 
tells it that 0xe6 is not printable in my locale (en_GB). It doesn't 
recognise 0xe6 as being a command character either, so it beeps the 
terminal and ignores it.

I wrote a quick test program to verify that isprint(0xe6) and 
iswprint(0xe6) both return zero in en_GB (on glibc). I have yet to find a 
locale where it is printable. It's a Danish character, but da_DK does not 
make it printable either. In my view, it displays fine on my terminal and 
I think it should be printable.

I'm not sure whether editline is doing the right thing here; I think it will 
have problems with UTF-8 and multibyte character sets, because it reads 
characters from the terminal one byte at a time, and checks them bytewise for 
printability. It should probably convert the input to Unicode and check Unicode 
characters for printability with iswprint().

However, in my locale characters are one byte long, so this would have no 
effect and iswprint(0x000000e6) returns exactly the same as isprint(0xe6), i.e. 
zero. This looks like a bug in the glibc locales to me.

I can't find out where the bug is, though. My /usr/share/i18n/locales/en_GB 
lists the following:

[... snip glibc locale tables ...]

Here is my test program:

chris@gcc(tmp)$ cat test.c
#include <stdio.h>

int main()
{
          int i;

          for (i = 0200; i <= 0xff; i++)
          {
                  printf("%d", isprint(i));
          }
          puts("");
          return 0;
}

chris@gcc(tmp)$ locale
LANG=en_GB
LC_CTYPE="en_GB"
LC_NUMERIC="en_GB"
LC_TIME="en_GB"
LC_COLLATE="en_GB"
LC_MONETARY="en_GB"
LC_MESSAGES="en_GB"
LC_PAPER="en_GB"
LC_NAME="en_GB"
LC_ADDRESS="en_GB"
LC_TELEPHONE="en_GB"
LC_MEASUREMENT="en_GB"
LC_IDENTIFICATION="en_GB"
LC_ALL=

chris@gcc(tmp)$ ./test
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

chris@gcc(tmp)$ LC_CTYPE=da_DK ./test
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

chris@gcc(tmp)$ cat /etc/issue
Fedora Core release 6 (Zod)

chris@gcc(tmp)$ rpm -q glibc
glibc-2.5-18.fc6

Can anyone help me to figure out what's wrong, or what I'm doing wrong?

Cheers, Chris.
-- 
_____ __     _
\  __/ / ,__(_)_  | Chris Wilson <0000 at qwirx.com> - Cambs UK |
/ (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Perl/SQL/HTML Developer |
\ _/_/_/_//_/___/ | We are GNU-free your mind-and your software |