tech-userlevel: Re: Permit loose matching of codeset names in locales

Subject: Re: Permit loose matching of codeset names in locales
To: Curt Sampson <cjs@cynic.net>
From: SODA Noriyuki <soda@sra.co.jp>
List: tech-userlevel
Date: 09/04/2004 02:58:25

>>>>> On Fri, 3 Sep 2004 16:26:14 +0900 (JST),
	Curt Sampson <cjs@cynic.net> said:

> As well as the preferred MIME name, it would be nice to match against
> all the aliases available for the character encoding. For example, the
> official aliases for ISO-8859-1 are:

I beleive the idea to use MIME charset names as UNIX codeset name is
a mistake.

Of course we need a way to map between those names, but there is
no real requrement to use MIME charset names as UNIX codeset name.
Allowing case insensitive names like MIME charset name just adds
needless complexity without any benefit.

About original topic, although compatibility with Linux locale names
is certainly nice, I believe simple alias mechanism is good enough
and we should stick on it. Linux locale matching mechanism is one
of over-engineered features, and what people are really using is
very small set of the linux supported names.
In the ideal world, everyone would use just one codeset name for
one codeset, and although it's too late for such world, we should
not introduce another variants of codeset names.

>     Name: ISO_8859-1:1987                                    [RFC1345,KXS2]
>     MIBenum: 4
>     Source: ECMA registry
>     Alias: iso-ir-100
>     Alias: ISO_8859-1
>     Alias: ISO-8859-1 (preferred MIME name)
>     Alias: latin1
>     Alias: l1
>     Alias: IBM819
>     Alias: CP819
>     Alias: csISOLatin1

> The full list is in the following IANA document (though they are
> mistakenly called character sets, rather than character encodings):

BTW, IANA codeset reigstry is now regretting the alias feature, and
goint to reject further addition of new aliases.
--
soda