NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

lib/44603: editline el_gets drops many UTF-8 characters



>Number:         44603
>Category:       lib
>Synopsis:       editline el_gets drops many UTF-8 characters
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    lib-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Feb 18 23:05:00 +0000 2011
>Originator:     Steven Vernon
>Release:        sources as of 2011/02/04
>Organization:
Citrix
>Environment:
>Description:
When using el_gets() in editline, which is called by the readline emulation 
function readline(), multi-byte characters are always dropped. This is 
incorrect for UTF-8 because many UTF-8 characters are multi-byte (all non-ASCII 
characters).
>How-To-Repeat:
Use either el_gets() or readline() when compiled for UTF-8 (build with 
WIDECHAR, which is the default) and set the local to some UTF-8 variant, such 
as en_US.UTF-8 (e.g. set the environment variable LC_ALL to this).
>Fix:
el_gets() unconditionally sets IGNORE_EXTCHARS before calling el_wgets() (and 
then resets it after the call). This causes read_char() to drop multi-byte 
characters.

There are 2 possible solutions:
1) Only set IGNORE_EXTCHARS if CHARSET_IS_UTF8 is not set (and don't unset it 
after the call to el_wgets()), as is done in el_getc().
2) Have read_char() not honor IGNORE_EXTCHARS if CHARSET_IS_UTF8. Ofhand, this 
seems like the better, more correct solution, but it could affect more paths 
through the code. If you do this you should probably remove the code from 
el_getc() to conditinally set and unset IGNORE_EXTCHARS.

More testing on UTF-8 should be done.



Home | Main Index | Thread Index | Old Index