lib/44600: libedit does not properly handle UTF-8 when glyphs are multiple Unicode characters

To: lib-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: lib/44600: libedit does not properly handle UTF-8 when glyphs are multiple Unicode characters
From: steve.vernon%citrix.com@localhost
Date: Fri, 18 Feb 2011 20:00:01 +0000 (UTC)

>Number:         44600
>Category:       lib
>Synopsis:       libedit does not properly handle UTF-8 when glyphs are 
>multiple Unicode characters
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    lib-bug-people
>State:          open
>Class:          change-request
>Submitter-Id:   net
>Arrival-Date:   Fri Feb 18 20:00:00 +0000 2011
>Originator:     Steven Vernon
>Release:        sources as of 2011/02/04
>Organization:
Citrix
>Environment:
>Description:
libedit when using UTF-8 assumes that one glyph (visible character) corresponds 
to one Unicode "code point" (character number) [and it reasonably assumes one 
glyph takes up one column and one row]. Unfortunately that is not always the 
case. There are non-composed glyphs that take up multiple Unicode code points. 
Examples include European languages that have accents that are not composed 
(e.g. a French "e" with an accent circumflex, but these are two different 
Unicode characters) and Indian character sets with viramas (?) that indicate 
vowels, such as in Hindi (which again are multiple Unicode code points).

libedit does not correctly do character deletion nor update the cursor position 
correctly.
>How-To-Repeat:
Enter data with non-composed accents or viramas, etc. Try backspacing over the 
data, moving the cursor left/right and deleting and/or inserting, and 
redisplaying after changes are made.

Beware that some character combinations also have pre-composed versions, which 
are given a single Unicode code point, such as the above French "e" with accent 
circumflex. These were only created for backward compability with certain 
character sets, such as Latin-1. Make sure you enter the non-composed versions 
if testing with these values.
>Fix:
Probably need to import Unicode information that determines which characters 
are combining. I believe that in all cases such combining characters follow the 
base character. See the Unicode site.

Prev by Date: lib/44599: libedit acts as if no data read if editmode is turned off
Next by Date: NetBSD Nightly Trouble Ticket Report
Previous by Thread: lib/44599: libedit acts as if no data read if editmode is turned off
Next by Thread: lib/44601: libedit does not properly handle right-to-left languages
Indexes:

Home | Main Index | Thread Index | Old Index