NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
lib/44600: libedit does not properly handle UTF-8 when glyphs are multiple Unicode characters
>Number: 44600
>Category: lib
>Synopsis: libedit does not properly handle UTF-8 when glyphs are
>multiple Unicode characters
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: lib-bug-people
>State: open
>Class: change-request
>Submitter-Id: net
>Arrival-Date: Fri Feb 18 20:00:00 +0000 2011
>Originator: Steven Vernon
>Release: sources as of 2011/02/04
>Organization:
Citrix
>Environment:
>Description:
libedit when using UTF-8 assumes that one glyph (visible character) corresponds
to one Unicode "code point" (character number) [and it reasonably assumes one
glyph takes up one column and one row]. Unfortunately that is not always the
case. There are non-composed glyphs that take up multiple Unicode code points.
Examples include European languages that have accents that are not composed
(e.g. a French "e" with an accent circumflex, but these are two different
Unicode characters) and Indian character sets with viramas (?) that indicate
vowels, such as in Hindi (which again are multiple Unicode code points).
libedit does not correctly do character deletion nor update the cursor position
correctly.
>How-To-Repeat:
Enter data with non-composed accents or viramas, etc. Try backspacing over the
data, moving the cursor left/right and deleting and/or inserting, and
redisplaying after changes are made.
Beware that some character combinations also have pre-composed versions, which
are given a single Unicode code point, such as the above French "e" with accent
circumflex. These were only created for backward compability with certain
character sets, such as Latin-1. Make sure you enter the non-composed versions
if testing with these values.
>Fix:
Probably need to import Unicode information that determines which characters
are combining. I believe that in all cases such combining characters follow the
base character. See the Unicode site.
Home |
Main Index |
Thread Index |
Old Index