Subject: Problems with libedit and international characters
To: None <jess@thrysoee.dk, netbsd-users@netbsd.org>
From: Chris Wilson <chris@qwirx.com>
List: netbsd-users
Date: 09/23/2007 15:58:59
Hi all,
[Please note: this message was just posted to the glibc locales list.
However I think it relates to libedit as well and would be interested to
speak to anyone who maintains it. I don't know whether it actually works
properly for international characters on BSD, perhaps someone could check?
Thanks!]
I'm new to locales and struggling to fix a bug with libedit/editline (a
BSD-licensed readline library from NetBSD) running on glibc (this gives
legal readline support for BSD-licensed programs on Linux).
editline works fine for most things, but international characters cannot
be entered (e.g. U00E6, æ, from Danish). I think I tracked it down to
a problem with isprint() in glibc. editline makes a map of insertable
characters from only the printable characters using isprint(), and glibc
tells it that 0xe6 is not printable in my locale (en_GB). It doesn't
recognise 0xe6 as being a command character either, so it beeps the
terminal and ignores it.
I wrote a quick test program to verify that isprint(0xe6) and
iswprint(0xe6) both return zero in en_GB (on glibc). I have yet to find a
locale where it is printable. It's a Danish character, but da_DK does not
make it printable either. In my view, it displays fine on my terminal and
I think it should be printable.
I'm not sure whether editline is doing the right thing here; I think it will
have problems with UTF-8 and multibyte character sets, because it reads
characters from the terminal one byte at a time, and checks them bytewise for
printability. It should probably convert the input to Unicode and check Unicode
characters for printability with iswprint().
However, in my locale characters are one byte long, so this would have no
effect and iswprint(0x000000e6) returns exactly the same as isprint(0xe6), i.e.
zero. This looks like a bug in the glibc locales to me.
I can't find out where the bug is, though. My /usr/share/i18n/locales/en_GB
lists the following:
[... snip glibc locale tables ...]
Here is my test program:
chris@gcc(tmp)$ cat test.c
#include <stdio.h>
int main()
{
int i;
for (i = 0200; i <= 0xff; i++)
{
printf("%d", isprint(i));
}
puts("");
return 0;
}
chris@gcc(tmp)$ locale
LANG=en_GB
LC_CTYPE="en_GB"
LC_NUMERIC="en_GB"
LC_TIME="en_GB"
LC_COLLATE="en_GB"
LC_MONETARY="en_GB"
LC_MESSAGES="en_GB"
LC_PAPER="en_GB"
LC_NAME="en_GB"
LC_ADDRESS="en_GB"
LC_TELEPHONE="en_GB"
LC_MEASUREMENT="en_GB"
LC_IDENTIFICATION="en_GB"
LC_ALL=
chris@gcc(tmp)$ ./test
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
chris@gcc(tmp)$ LC_CTYPE=da_DK ./test
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
chris@gcc(tmp)$ cat /etc/issue
Fedora Core release 6 (Zod)
chris@gcc(tmp)$ rpm -q glibc
glibc-2.5-18.fc6
Can anyone help me to figure out what's wrong, or what I'm doing wrong?
Cheers, Chris.
--
_____ __ _
\ __/ / ,__(_)_ | Chris Wilson <0000 at qwirx.com> - Cambs UK |
/ (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Perl/SQL/HTML Developer |
\ _/_/_/_//_/___/ | We are GNU-free your mind-and your software |