tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: __STDC_ISO_10646__



On Thu, Jun 01, 2017 at 11:05:01AM -0700, Konrad Schroder wrote:
> Is there any particular reason not to implement the requirements for
> __STDC_ISO_10646__, that is, to use Unicode UCS for wchar_t?  Right now we
> use a locale-dependent encoding (and we are not alone in this).

Soda-san is against it :) I was kind of agreeing with the idea of locale
specific encodings 10 years ago, but I've come to the conclusion that
the price doesn't justify the gains:

(1) The primary reference for data exchange is Unicode. Legacy character
sets still exist and are deployed, but they are certainly exactly that
-- legacy for compatibility with other (older) systems.

(2) The far majority of all existing character sets can be easily
converted to and from Unicode.

(3) If individual input characters can't be faithfully roundtripped to
Unicode and back, we can just as well assign them private data points.
Transliteration is likely needed in this case anyway for purposes like
iconv.

(4) Giving up locale-dependent wchar_t would significantly simplify the
code by allowing a full layer of abstraction to be removed as well as
the associated redundancy of implementations.

(5) It is nearly free for western character sets, decent in terms of
code complexity for Shift-JIS, ISO 2022 and EUC. Big5 is a mess, but
primarily because it needs a large translation table.

I still believe the advantages outweight the price a lot.

Joerg


Home | Main Index | Thread Index | Old Index