NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: lib/57064: Import OpenBSD's script to autogen Unicode ctype definition?



Am Tue, Oct 18, 2022 at 04:30:01AM +0000 schrieb rokuyama.rk%gmail.com@localhost:
> Unicode has added thousands characters per year in a totally
> unorganized ways. Our ctype definition for UTF-8 has been left
> untouched in the last decade, with very few exceptions:

I had a Python script for converting the CLDR definitions directly to
the full locale descriptions, but I misplaced that in recent years it
seems.

> Also note that switch to OpenBSD's ctype definition of UTF-8 does
> *not* completely resolve our problems related to UTF-8. Our Citrus
> locale does not recognize combining characters (incl. variation
> selectors). Such characters may confuse applications.

That's not really surprising. Combining characters don't fit the classic
model of ISO C very well. They confuse a lot of software that makes poor
assumptions like every glyph corresponds 1:1 with a unicode code point
etc. But that's beyond the scope.

Joerg


Home | Main Index | Thread Index | Old Index