NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: lib/57064: Import OpenBSD's script to autogen Unicode ctype definition?



The following reply was made to PR lib/57064; it has been noted by GNATS.

From: Joerg Sonnenberger <joerg%bec.de@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: lib-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
	netbsd-bugs%netbsd.org@localhost
Subject: Re: lib/57064: Import OpenBSD's script to autogen Unicode ctype
 definition?
Date: Tue, 18 Oct 2022 20:15:21 +0200

 Am Tue, Oct 18, 2022 at 04:30:01AM +0000 schrieb rokuyama.rk%gmail.com@localhost:
 > Unicode has added thousands characters per year in a totally
 > unorganized ways. Our ctype definition for UTF-8 has been left
 > untouched in the last decade, with very few exceptions:
 
 I had a Python script for converting the CLDR definitions directly to
 the full locale descriptions, but I misplaced that in recent years it
 seems.
 
 > Also note that switch to OpenBSD's ctype definition of UTF-8 does
 > *not* completely resolve our problems related to UTF-8. Our Citrus
 > locale does not recognize combining characters (incl. variation
 > selectors). Such characters may confuse applications.
 
 That's not really surprising. Combining characters don't fit the classic
 model of ISO C very well. They confuse a lot of software that makes poor
 assumptions like every glyph corresponds 1:1 with a unicode code point
 etc. But that's beyond the scope.
 
 Joerg
 


Home | Main Index | Thread Index | Old Index