NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
lib/44603: editline el_gets drops many UTF-8 characters
>Number: 44603
>Category: lib
>Synopsis: editline el_gets drops many UTF-8 characters
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: lib-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Feb 18 23:05:00 +0000 2011
>Originator: Steven Vernon
>Release: sources as of 2011/02/04
>Organization:
Citrix
>Environment:
>Description:
When using el_gets() in editline, which is called by the readline emulation
function readline(), multi-byte characters are always dropped. This is
incorrect for UTF-8 because many UTF-8 characters are multi-byte (all non-ASCII
characters).
>How-To-Repeat:
Use either el_gets() or readline() when compiled for UTF-8 (build with
WIDECHAR, which is the default) and set the local to some UTF-8 variant, such
as en_US.UTF-8 (e.g. set the environment variable LC_ALL to this).
>Fix:
el_gets() unconditionally sets IGNORE_EXTCHARS before calling el_wgets() (and
then resets it after the call). This causes read_char() to drop multi-byte
characters.
There are 2 possible solutions:
1) Only set IGNORE_EXTCHARS if CHARSET_IS_UTF8 is not set (and don't unset it
after the call to el_wgets()), as is done in el_getc().
2) Have read_char() not honor IGNORE_EXTCHARS if CHARSET_IS_UTF8. Ofhand, this
seems like the better, more correct solution, but it could affect more paths
through the code. If you do this you should probably remove the code from
el_getc() to conditinally set and unset IGNORE_EXTCHARS.
More testing on UTF-8 should be done.
Home |
Main Index |
Thread Index |
Old Index